[
https://issues.apache.org/jira/browse/FLINK-35536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Juliusz Nadberezny updated FLINK-35536:
---------------------------------------
Attachment: RecordWiseFileCompactorSpecificAvroReaderFactory.java
> FileSystem sink on S3 produces invalid Avros when compaction is turned off
> --------------------------------------------------------------------------
>
> Key: FLINK-35536
> URL: https://issues.apache.org/jira/browse/FLINK-35536
> Project: Flink
> Issue Type: Bug
> Components: Connectors / FileSystem
> Affects Versions: 1.19.0
> Reporter: Juliusz Nadberezny
> Priority: Major
> Attachments: FileSink.java,
> RecordWiseFileCompactorSpecificAvroReaderFactory.java
>
>
> Compaction on FileSystem sink on S3 uses multipart upload process.
> When compaction is turned on, everything is working as expected and sink
> produces correct files.
> The problem is when you disable compaction for the sink that previously had
> it enabled. In this case files that where being kept by multipart upload and
> then are "released" with CompleteMultipartUpload will be broken.
> Broken Avro files seem to have Avro schema duplicated at the beginning of the
> file.
>
> Attached please find:
> 1. Implementation of RecordWiseFileCompactor.Reader.Factory that we are using.
> 2. FileSink definition
>
> Steps to reproduce:
> 1. Deploy job with FileSystem sink with compaction enabled writing to
> S3/MinIO.
> 2. Wait for job to produce some output.
> 3. Redeploy job with compaction disabled.
> 4. Wait for multipart upload complete and verify released files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)