Re: 回复： Rewriting a new file instead of writing a ".valid-length" file inBucketSink when restoring

Timo Walther Tue, 15 May 2018 05:22:13 -0700

As far as I know, the bucketing sink is currenlty also limited byrelying on Hadoops file system abstraction. It is planned to switch toFlink's file system abstraction which might also improve this situation.Kostas (in CC) might know more about it.

But I think we can discuss if an other behavior should be configurableas well. Would you be willing to contribute?


Regards,
Timo


Am 15.05.18 um 14:01 schrieb Xinyu Zhang:

Thanks for your reply.
Indeed, if a file is very large, it will take a long time. However,the the ??.valid-length?? file is not?0?2convenient for others who use thedata in HDFS.Maybe we should provide a configuration for users to choose whichstrategy they prefer.
Do you have any ideas?


------------------?0?2?????????0?2------------------
*??????:*?0?2"Timo Walther"<[email protected]>;
*????????:*?0?22018??5??15??(??????) ????7:30
*??????:*?0?2"dev"<[email protected]>;
*????:*?0?2Re: Rewriting a new file instead of writing a ".valid-length"file inBucketSink when restoring
I guess writing a new file would take much longer than just using the
.valid-length file, especially if the files are very large. The
restoring time should be as minimal as possible to ensure little
downtime on restarts.

Regards,
Timo


Am 15.05.18 um 09:31 schrieb Gary Yao:
> Hi,
>
> The BucketingSink truncates the file if the Hadoop FileSystemsupports this> operation (Hadoop 2.7 and above) [1]. What version of Hadoop are youusing?
>
> Best,
> Gary
>
> [1]
>https://github.com/apache/flink/blob/bcd028d75b0e5c5c691e24640a2196b2fdaf85e0/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L301
>
> On Mon, May 14, 2018 at 1:37 PM, ?????? <[email protected]> wrote:
>
>> Hi
>>
>>
>> I'm trying to copy data from kafka to HDFS . The data in HDFS isused to
>> do other computations by others in map/reduce.
>> If some tasks failed, the ".valid-length" file is created for the low
>> version hadoop. The problem is other people must know how to dealwith the
>> ".valid-length" file, otherwise, the data may be not exactly-once.
>> Hence, why not rewrite a new file when restoring instead of writing a
>> ".valid-length" file. In this way, others who use the data in HDFSdon't
>> need to know how to deal with the ".valid-length" file.
>>
>>
>> Thanks!
>>
>>
>> Zhang Xinyu

Re: 回复： Rewriting a new file instead of writing a ".valid-length" file inBucketSink when restoring

Reply via email to