[ 
https://issues.apache.org/jira/browse/FLINK-35521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

EMERSON WANG updated FLINK-35521:
---------------------------------
    Summary: Flink FileSystem SQL Connector Generating SUCCESS File Multiple 
Times  (was: Flink FileSystem SQL Connector Generating SUCESS File Multiple 
Times)

> Flink FileSystem SQL Connector Generating SUCCESS File Multiple Times
> ---------------------------------------------------------------------
>
>                 Key: FLINK-35521
>                 URL: https://issues.apache.org/jira/browse/FLINK-35521
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.18.1
>         Environment: Our PyFlink SQL jobs are running in AWS EKS environment.
>            Reporter: EMERSON WANG
>            Priority: Major
>
> Our Flink table SQL job received data from the Kafka streams and then sinked 
> all partitioned data into the associated parquet files under the same S3 
> folder through the filesystem SQL connector.
> For the S3 filesystem SQL connector, sink.partition-commit.policy.kind was 
> set to 'success-file' and sink.partition-commit.trigger was set to 
> 'partition-time'. We found that _SUCCESS file in the S3 folder was generated 
> multiple times after multiple partitions are committed.
> Because all partitioned parquet files and _SUCCESS file are in the same S3 
> folder and _SUCCESS file is used to trigger the downstream application, we 
> really like the _SUCCESS file to be generated only once instead of multiple 
> times after all partitions are committed and all parquet files are ready to 
> be processed. Thus, one _SUCCESS file can be used to trigger the downstream 
> application only once instead of multiple times.
> We knew we could set sink.partition-commit.trigger to 'process-time' to 
> generate _SUCCESS file only once in the S3 folder; however, 'process-time' 
> would not meet our business requirements.
> We'd request the FileSystem SQL connector should support to the following new 
> user case:
> Even if sink.partition-commit.trigger is set to 'partition-time', _SUCCESS 
> file will be generated only once after all partitions are committed and all 
> output files are ready to be processed, and will be used to trigger the 
> downstream application only once instead of multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to