Re: Rewriting a new file instead of writing a ".valid-length" file in BucketSink when restoring

Timo Walther Tue, 15 May 2018 04:30:29 -0700

I guess writing a new file would take much longer than just using the.valid-length file, especially if the files are very large. Therestoring time should be as minimal as possible to ensure littledowntime on restarts.


Regards,
Timo



Am 15.05.18 um 09:31 schrieb Gary Yao:

Hi,

The BucketingSink truncates the file if the Hadoop FileSystem supports this
operation (Hadoop 2.7 and above) [1]. What version of Hadoop are you using?

Best,
Gary

[1]
https://github.com/apache/flink/blob/bcd028d75b0e5c5c691e24640a2196b2fdaf85e0/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L301

On Mon, May 14, 2018 at 1:37 PM, 张馨予 <[email protected]> wrote:

Hi


I'm trying to copy data from kafka to HDFS . The data in HDFS is used to
do other computations by others in map/reduce.
If some tasks failed, the ".valid-length" file is created for the low
version hadoop. The problem is other people must know how to deal with the
".valid-length" file, otherwise, the data may be not exactly-once.
Hence, why not rewrite a new file when restoring instead of writing a
".valid-length" file. In this way, others who use the data in HDFS don't
need to know how to deal with the ".valid-length" file.


Thanks!


Zhang Xinyu

Re: Rewriting a new file instead of writing a ".valid-length" file in BucketSink when restoring

Reply via email to