aljoscha commented on issue #9434: [FLINK-13634][Formats]: added module to 
compress data using StreamingFileSink
URL: https://github.com/apache/flink/pull/9434#issuecomment-559043668
 
 
   Thanks for your contribution! I think this is a valuable addition.
   
   I do have some comments about naming and structure:
   
   I think the maven module should be called something like 
`org.apache.flink.formats.hadoopcompress` to show that this is for using Hadoop 
compression.
   
   We should make the structure similar to `flink-parquet`, which is another 
format for the `StreamingFileSink` in order to give consistency to users. This 
means:
    - Add a `org.apache.flink.formats.hadoopcompress.HadoopCompressionWriters` 
similar to `ParquetAvroWriters` that has static methods for creating a writer 
using various forms of specifying a codec. You have these methods in 
`CompressFileWriterBuilder` right now, so they should be moved and that class 
should be removed.
    - `CompressFileWriter` should be called `CompressionWriterFactory`, similar 
to `ParquetWriterFactory`. This can be returned from the static methods above.
    - I don't see a need for the `Compressor` interface. `HadoopCompressor` 
should be called `CompressionBulkWriter`, because that's what it is and it 
would be created from the `CompressionWriterFactory`.
    - I don't see a need for a "no-compression" mode when the purpose of this 
addition is compression
   
   If we iterate quickly we can try and manage to get this into the next 
release. :ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to