[
https://issues.apache.org/jira/browse/FLINK-35521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
EMERSON WANG updated FLINK-35521:
---------------------------------
Summary: Flink FileSystem SQL Connector Generating SUCCESS File Multiple
Times (was: Flink FileSystem SQL Connector Generating SUCESS File Multiple
Times)
> Flink FileSystem SQL Connector Generating SUCCESS File Multiple Times
> ---------------------------------------------------------------------
>
> Key: FLINK-35521
> URL: https://issues.apache.org/jira/browse/FLINK-35521
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem
> Affects Versions: 1.18.1
> Environment: Our PyFlink SQL jobs are running in AWS EKS environment.
> Reporter: EMERSON WANG
> Priority: Major
>
> Our Flink table SQL job received data from the Kafka streams and then sinked
> all partitioned data into the associated parquet files under the same S3
> folder through the filesystem SQL connector.
> For the S3 filesystem SQL connector, sink.partition-commit.policy.kind was
> set to 'success-file' and sink.partition-commit.trigger was set to
> 'partition-time'. We found that _SUCCESS file in the S3 folder was generated
> multiple times after multiple partitions are committed.
> Because all partitioned parquet files and _SUCCESS file are in the same S3
> folder and _SUCCESS file is used to trigger the downstream application, we
> really like the _SUCCESS file to be generated only once instead of multiple
> times after all partitions are committed and all parquet files are ready to
> be processed. Thus, one _SUCCESS file can be used to trigger the downstream
> application only once instead of multiple times.
> We knew we could set sink.partition-commit.trigger to 'process-time' to
> generate _SUCCESS file only once in the S3 folder; however, 'process-time'
> would not meet our business requirements.
> We'd request the FileSystem SQL connector should support to the following new
> user case:
> Even if sink.partition-commit.trigger is set to 'partition-time', _SUCCESS
> file will be generated only once after all partitions are committed and all
> output files are ready to be processed, and will be used to trigger the
> downstream application only once instead of multiple times.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)