Ying Xu created FLINK-13027:
-------------------------------

             Summary: StreamingFileSink bulk-encoded writer supports file 
rolling upon customized events
                 Key: FLINK-13027
                 URL: https://issues.apache.org/jira/browse/FLINK-13027
             Project: Flink
          Issue Type: New Feature
          Components: API / DataStream
            Reporter: Ying Xu


When writing in bulk-encoded format such as Parquet, StreamingFileSink only 
supports OnCheckpointRollingPolicy, which rolls file at checkpointing time.    

In many scenarios, it is beneficial that the sink can roll file upon certain 
events, for example, when the file size reaches a limit. Such a rolling policy 
can also potentially alleviate some of the side effects of 
OnCheckpointRollingPolicy, e.g.,, most of the heavy liftings including file 
uploading all happen at the checkpoint time.  

Specifically, this Jira calls for a new rolling policy that rolls file: 
 # whenever a customized event happens, e.g., the file size reaches certain 
limit. 
 # whenever a checkpoint happens. This is needed for providing exactly-once 
guarantees when writing bulk-encoded files. 

Users of this rolling policy need to be aware that the customized event and the 
next checkpoint epoch may be close to each other, thus may yield a tiny file 
per checkpoint at the worst. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to