[jira] [Created] (FLINK-5284) Make output of bucketing sink compatible with other processing framework like mapreduce

Wenlong Lyu (JIRA) Thu, 08 Dec 2016 02:01:25 -0800

Wenlong Lyu created FLINK-5284:
----------------------------------

             Summary: Make output of bucketing sink compatible with other 
processing framework like mapreduce
                 Key: FLINK-5284
                 URL: https://issues.apache.org/jira/browse/FLINK-5284
             Project: Flink
          Issue Type: Improvement
          Components: filesystem-connector
            Reporter: Wenlong Lyu
            Assignee: Wenlong Lyu



Currently bucketing sink cannot move the in-progress and pending files to final 
output when the stream finished, and when recovering, the current output file 
will contain some invalid content, which can only be identified by the 
file-length meta file. These make the final output of the job incompatible to 
other processing framework like mapreduce. There are two things to do to solve 
the problem:
1. add direct output option to bucketing sink, which writes output to the final 
file, and delete/truncate the some file when fail over. direct output will be 
quite useful specially for finite stream job, which can enable user to migrate 
there batch job to streaming, taking advantage of features such as 
checkpointing.
2. add truncate by copy option to enable bucketing sink to resize output file 
by copying content valid in current file instead of creating a length meta 
file. truncate by copy will make some more extra IO operation, but can make the 
output more clean.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5284) Make output of bucketing sink compatible with other processing framework like mapreduce

Reply via email to