[
https://issues.apache.org/jira/browse/FLINK-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-13027:
-----------------------------------
Labels: pull-request-available (was: )
> StreamingFileSink bulk-encoded writer supports file rolling upon customized
> events
> ----------------------------------------------------------------------------------
>
> Key: FLINK-13027
> URL: https://issues.apache.org/jira/browse/FLINK-13027
> Project: Flink
> Issue Type: New Feature
> Components: API / DataStream
> Reporter: Ying Xu
> Assignee: Ying Xu
> Priority: Major
> Labels: pull-request-available
>
> When writing in bulk-encoded format such as Parquet, StreamingFileSink only
> supports OnCheckpointRollingPolicy, which rolls file at checkpointing time.
>
> In many scenarios, it is beneficial that the sink can roll file upon certain
> events, for example, when the file size reaches a limit. Such a rolling
> policy can also potentially alleviate some of the side effects of
> OnCheckpointRollingPolicy, e.g.,, most of the heavy liftings including file
> uploading all happen at the checkpoint time.
> Specifically, this Jira calls for a new rolling policy that rolls file:
> # whenever a customized event happens, e.g., the file size reaches certain
> limit.
> # whenever a checkpoint happens. This is needed for providing exactly-once
> guarantees when writing bulk-encoded files.
> Users of this rolling policy need to be aware that the customized event and
> the next checkpoint epoch may be close to each other, thus may yield a tiny
> file per checkpoint at the worst.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)