As far as I know, the bucketing sink is currenlty also limited by
relying on Hadoops file system abstraction. It is planned to switch to
Flink's file system abstraction which might also improve this situation.
Kostas (in CC) might know more about it.
But I think we can discuss if an other behavior should be configurable
as well. Would you be willing to contribute?
Regards,
Timo
Am 15.05.18 um 14:01 schrieb Xinyu Zhang:
Thanks for your reply.
Indeed, if a file is very large, it will take a long time. However,
the the ??.valid-length?? file is not?0?2convenient for others who use the
data in HDFS.
Maybe we should provide a configuration for users to choose which
strategy they prefer.
Do you have any ideas?
------------------?0?2?????????0?2------------------
*??????:*?0?2"Timo Walther"<twal...@apache.org>;
*????????:*?0?22018??5??15??(??????) ????7:30
*??????:*?0?2"dev"<dev@flink.apache.org>;
*????:*?0?2Re: Rewriting a new file instead of writing a ".valid-length"
file inBucketSink when restoring
I guess writing a new file would take much longer than just using the
.valid-length file, especially if the files are very large. The
restoring time should be as minimal as possible to ensure little
downtime on restarts.
Regards,
Timo
Am 15.05.18 um 09:31 schrieb Gary Yao:
> Hi,
>
> The BucketingSink truncates the file if the Hadoop FileSystem
supports this
> operation (Hadoop 2.7 and above) [1]. What version of Hadoop are you
using?
>
> Best,
> Gary
>
> [1]
>
https://github.com/apache/flink/blob/bcd028d75b0e5c5c691e24640a2196b2fdaf85e0/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L301
>
> On Mon, May 14, 2018 at 1:37 PM, ?????? <342689...@qq.com> wrote:
>
>> Hi
>>
>>
>> I'm trying to copy data from kafka to HDFS . The data in HDFS is
used to
>> do other computations by others in map/reduce.
>> If some tasks failed, the ".valid-length" file is created for the low
>> version hadoop. The problem is other people must know how to deal
with the
>> ".valid-length" file, otherwise, the data may be not exactly-once.
>> Hence, why not rewrite a new file when restoring instead of writing a
>> ".valid-length" file. In this way, others who use the data in HDFS
don't
>> need to know how to deal with the ".valid-length" file.
>>
>>
>> Thanks!
>>
>>
>> Zhang Xinyu