Ying Xu created FLINK-13027:
-------------------------------
Summary: StreamingFileSink bulk-encoded writer supports file
rolling upon customized events
Key: FLINK-13027
URL: https://issues.apache.org/jira/browse/FLINK-13027
Project: Flink
Issue Type: New Feature
Components: API / DataStream
Reporter: Ying Xu
When writing in bulk-encoded format such as Parquet, StreamingFileSink only
supports OnCheckpointRollingPolicy, which rolls file at checkpointing time.
In many scenarios, it is beneficial that the sink can roll file upon certain
events, for example, when the file size reaches a limit. Such a rolling policy
can also potentially alleviate some of the side effects of
OnCheckpointRollingPolicy, e.g.,, most of the heavy liftings including file
uploading all happen at the checkpoint time.
Specifically, this Jira calls for a new rolling policy that rolls file:
# whenever a customized event happens, e.g., the file size reaches certain
limit.
# whenever a checkpoint happens. This is needed for providing exactly-once
guarantees when writing bulk-encoded files.
Users of this rolling policy need to be aware that the customized event and the
next checkpoint epoch may be close to each other, thus may yield a tiny file
per checkpoint at the worst.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)