Wenlong Lyu created FLINK-5284:
----------------------------------
Summary: Make output of bucketing sink compatible with other
processing framework like mapreduce
Key: FLINK-5284
URL: https://issues.apache.org/jira/browse/FLINK-5284
Project: Flink
Issue Type: Improvement
Components: filesystem-connector
Reporter: Wenlong Lyu
Assignee: Wenlong Lyu
Currently bucketing sink cannot move the in-progress and pending files to final
output when the stream finished, and when recovering, the current output file
will contain some invalid content, which can only be identified by the
file-length meta file. These make the final output of the job incompatible to
other processing framework like mapreduce. There are two things to do to solve
the problem:
1. add direct output option to bucketing sink, which writes output to the final
file, and delete/truncate the some file when fail over. direct output will be
quite useful specially for finite stream job, which can enable user to migrate
there batch job to streaming, taking advantage of features such as
checkpointing.
2. add truncate by copy option to enable bucketing sink to resize output file
by copying content valid in current file instead of creating a length meta
file. truncate by copy will make some more extra IO operation, but can make the
output more clean.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)