zenfenan commented on a change in pull request #11307: [FLINK-16371]
[BulkWriter] Fix Hadoop Compression BulkWriter
URL: https://github.com/apache/flink/pull/11307#discussion_r388943884
##########
File path:
flink-formats/flink-compress/src/main/java/org/apache/flink/formats/compress/CompressWriterFactory.java
##########
@@ -42,33 +47,41 @@
private Extractor<IN> extractor;
private CompressionCodec hadoopCodec;
+ private String hadoopCodecName;
+ private Map<String, String> hadoopConfigurationMap;
public CompressWriterFactory(Extractor<IN> extractor) {
this.extractor = Preconditions.checkNotNull(extractor,
"extractor cannot be null");
+ this.hadoopConfigurationMap = new HashMap<>();
}
public CompressWriterFactory<IN> withHadoopCompression(String
hadoopCodecName) {
return withHadoopCompression(hadoopCodecName, new
Configuration());
}
public CompressWriterFactory<IN> withHadoopCompression(String
hadoopCodecName, Configuration hadoopConfiguration) {
- return withHadoopCompression(new
CompressionCodecFactory(hadoopConfiguration).getCodecByName(hadoopCodecName));
- }
+ this.hadoopCodecName = hadoopCodecName;
+
+ for (Map.Entry<String, String> entry : hadoopConfiguration) {
+ hadoopConfigurationMap.put(entry.getKey(),
entry.getValue());
+ }
- public CompressWriterFactory<IN> withHadoopCompression(CompressionCodec
hadoopCodec) {
Review comment:
Even if it is a custom implementation, providing the complete class name,
simple name or even alias should just be fine. You just have to add your custom
implementation's name in Hadoop configuration for the property
`io.compression.codecs` in either Hadoop's Configuration file or by
programatically.
Hadoop's `CompressionCodecFactory` takes care of the rest using
`ServiceLoader`. I have added test cases to cover that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services