[ 
https://issues.apache.org/jira/browse/FLINK-32562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi reassigned FLINK-32562:
--------------------------------------

    Assignee: Ferenc Csaky

> FileSink Compactor Service should not use FileWriter from Sink for 
> OutputStreamBasedFileCompactor
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-32562
>                 URL: https://issues.apache.org/jira/browse/FLINK-32562
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.18.0
>            Reporter: Shengnan YU
>            Assignee: Ferenc Csaky
>            Priority: Major
>
> Gzip format is designed to be concatenatable but it will be broken by 
> Compactor in FileSink. 
> It is because when Compactor Service create new compacted file by using 
> GzipOutputStream, which will create extra bytes at header, which cause the 
> final file will have extra bytes in header. (Gzip header is presented in 
> every finished part file, we don't need an extra header in compacted file). 
> This is because in Compactor Service, it uses the FileWriter specified in 
> FileSink to create the compacted outputstream. I think will should use an 
> simple bytes ouputstream to concat stream instead, or at least give a option.
>  
> Currently the ConcatFileCompactor only supports pure text file. Many 
> compressed codec support concating like gzip, bzip2. I think we should 
> support those kind of concating, otherwise people must use 
> RecordWiseCompactorFactor which is very ineffcient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to