yxu-valleytider opened a new pull request #10653: [FLINK-13027][streaming]: StreamingFileSink bulk-encoded writer supports customized checkpoint policy URL: https://github.com/apache/flink/pull/10653 ## What is the purpose of the change This PR allows bulk-encoded `StreamingFileSink` to instantiate from a generic family of rolling policy which rolls files at the checkpoint time. It achieves so by defining a base *CheckpointRollingPolicy*, which is extended by the existing `OnCheckpointRollingPolicy` and a new rolling policy `FSizeCheckpointRollingPolicy`. The latter policy rolls file not only at the checkpoint time, but also possibly before file size reaches a certain limit, which is useful for preventing file sizes from growing too big. Recurrent builder pattern described in [[1](https://community.oracle.com/blogs/emcmanus/2010/10/24/using-builder-pattern-subclasses)] and [[2](https://stackoverflow.com/questions/17164375/subclassing-a-java-builder-class)] are used to instantiate the rolling policies whenever appropriate, making individual rolling policy also extensible. ## Brief change log **CheckpointRollingPolicy** - An abstract class implementing the base rolling policy which rolls file at every checkpoint. **FSizeCheckpointRollingPolicy** - A new rolling policy implementation which rolls part file both when size exceeds a limit, *in addition to* during a checkpoint event. **StreamingFileSink** - Bulk-encoded sink writer (*forBulkFormat()*) takes a generic `CheckpointRollingPolicy` during instantiation. `OnCheckpointRollingPolicy` is still the default, but won't be the only option. ## Verifying this change This change is an interface change and already covered by existing tests, such as *LocalStreamingFileSinkTest and BulkWriterTest*. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (yes) minor interface change to `StreamingFileSink`. ## Documentation - Does this pull request introduce a new feature? (no)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services