[
https://issues.apache.org/jira/browse/FLINK-32562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-32562:
-----------------------------------
Labels: pull-request-available (was: )
> FileSink Compactor Service should not use FileWriter from Sink for
> OutputStreamBasedFileCompactor
> -------------------------------------------------------------------------------------------------
>
> Key: FLINK-32562
> URL: https://issues.apache.org/jira/browse/FLINK-32562
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem
> Affects Versions: 1.18.0
> Reporter: Shengnan YU
> Assignee: Ferenc Csaky
> Priority: Major
> Labels: pull-request-available
>
> Gzip format is designed to be concatenatable but it will be broken by
> Compactor in FileSink.
> It is because when Compactor Service create new compacted file by using
> GzipOutputStream, which will create extra bytes at header, which cause the
> final file will have extra bytes in header. (Gzip header is presented in
> every finished part file, we don't need an extra header in compacted file).
> This is because in Compactor Service, it uses the FileWriter specified in
> FileSink to create the compacted outputstream. I think will should use an
> simple bytes ouputstream to concat stream instead, or at least give a option.
>
> Currently the ConcatFileCompactor only supports pure text file. Many
> compressed codec support concating like gzip, bzip2. I think we should
> support those kind of concating, otherwise people must use
> RecordWiseCompactorFactor which is very ineffcient.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)