[ https://issues.apache.org/jira/browse/FLINK-32562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Márton Balassi reassigned FLINK-32562: -------------------------------------- Assignee: Ferenc Csaky > FileSink Compactor Service should not use FileWriter from Sink for > OutputStreamBasedFileCompactor > ------------------------------------------------------------------------------------------------- > > Key: FLINK-32562 > URL: https://issues.apache.org/jira/browse/FLINK-32562 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Affects Versions: 1.18.0 > Reporter: Shengnan YU > Assignee: Ferenc Csaky > Priority: Major > > Gzip format is designed to be concatenatable but it will be broken by > Compactor in FileSink. > It is because when Compactor Service create new compacted file by using > GzipOutputStream, which will create extra bytes at header, which cause the > final file will have extra bytes in header. (Gzip header is presented in > every finished part file, we don't need an extra header in compacted file). > This is because in Compactor Service, it uses the FileWriter specified in > FileSink to create the compacted outputstream. I think will should use an > simple bytes ouputstream to concat stream instead, or at least give a option. > > Currently the ConcatFileCompactor only supports pure text file. Many > compressed codec support concating like gzip, bzip2. I think we should > support those kind of concating, otherwise people must use > RecordWiseCompactorFactor which is very ineffcient. -- This message was sent by Atlassian Jira (v8.20.10#820010)