aljoscha commented on issue #9434: [FLINK-13634][Formats]: added module to compress data using StreamingFileSink URL: https://github.com/apache/flink/pull/9434#issuecomment-559043668 Thanks for your contribution! I think this is a valuable addition. I do have some comments about naming and structure: I think the maven module should be called something like `org.apache.flink.formats.hadoopcompress` to show that this is for using Hadoop compression. We should make the structure similar to `flink-parquet`, which is another format for the `StreamingFileSink` in order to give consistency to users. This means: - Add a `org.apache.flink.formats.hadoopcompress.HadoopCompressionWriters` similar to `ParquetAvroWriters` that has static methods for creating a writer using various forms of specifying a codec. You have these methods in `CompressFileWriterBuilder` right now, so they should be moved and that class should be removed. - `CompressFileWriter` should be called `CompressionWriterFactory`, similar to `ParquetWriterFactory`. This can be returned from the static methods above. - I don't see a need for the `Compressor` interface. `HadoopCompressor` should be called `CompressionBulkWriter`, because that's what it is and it would be created from the `CompressionWriterFactory`. - I don't see a need for a "no-compression" mode when the purpose of this addition is compression If we iterate quickly we can try and manage to get this into the next release. :ok
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
