[GitHub] [flink] kl0u commented on a change in pull request #11307: [FLINK-16371] [BulkWriter] Fix Hadoop Compression BulkWriter

GitBox Mon, 09 Mar 2020 06:08:01 -0700

kl0u commented on a change in pull request #11307: [FLINK-16371] [BulkWriter] 
Fix Hadoop Compression BulkWriter
URL: https://github.com/apache/flink/pull/11307#discussion_r389644827


 ##########
 File path: 
flink-formats/flink-compress/src/main/java/org/apache/flink/formats/compress/CompressWriterFactory.java
 ##########
 @@ -42,39 +47,57 @@
 
        private Extractor<IN> extractor;
        private CompressionCodec hadoopCodec;
+       private String hadoopCodecName;
+       private Map<String, String> hadoopConfigurationMap;
+       private String codecExtension;
 
        public CompressWriterFactory(Extractor<IN> extractor) {
-               this.extractor = Preconditions.checkNotNull(extractor, 
"extractor cannot be null");
+               this.extractor = checkNotNull(extractor, "Extractor cannot be 
null");
+               this.hadoopConfigurationMap = new HashMap<>();
        }
 
        public CompressWriterFactory<IN> withHadoopCompression(String 
hadoopCodecName) {
                return withHadoopCompression(hadoopCodecName, new 
Configuration());
        }
 
        public CompressWriterFactory<IN> withHadoopCompression(String 
hadoopCodecName, Configuration hadoopConfiguration) {
-               return withHadoopCompression(new 
CompressionCodecFactory(hadoopConfiguration).getCodecByName(hadoopCodecName));
-       }
+               CompressionCodec codec = new 
CompressionCodecFactory(hadoopConfiguration).getCodecByName(hadoopCodecName);
+               this.codecExtension = checkNotNull(codec, "Unable to load the 
provided Hadoop codec [" + hadoopCodecName + "]")
+                       .getDefaultExtension();
+
+               this.hadoopCodecName = hadoopCodecName;
+
+               for (Map.Entry<String, String> entry : hadoopConfiguration) {
+                       hadoopConfigurationMap.put(entry.getKey(), 
entry.getValue());
+               }
 
-       public CompressWriterFactory<IN> withHadoopCompression(CompressionCodec 
hadoopCodec) {
-               this.hadoopCodec = Preconditions.checkNotNull(hadoopCodec, 
"hadoopCodec cannot be null");
                return this;
        }
 
        @Override
        public BulkWriter<IN> create(FSDataOutputStream out) throws IOException 
{
-               try {
-                       return (hadoopCodec != null)
-                               ? new HadoopCompressionBulkWriter<>(out, 
extractor, hadoopCodec)
-                               : new NoCompressionBulkWriter<>(out, extractor);
-               } catch (Exception e) {
-                       throw new IOException(e.getLocalizedMessage(), e);
+               if (hadoopCodecName == null || hadoopCodecName.length() == 0) {
+                       return new NoCompressionBulkWriter<>(out, extractor);
                }
+
+               initializeCompressionCodec();
+
+               return new 
HadoopCompressionBulkWriter<>(hadoopCodec.createOutputStream(out), extractor);
        }
 
-       public String codecExtension() {
-               return (hadoopCodec != null)
-                       ? hadoopCodec.getDefaultExtension()
-                       : "";
+       public String getExtension() {
+               return (hadoopCodecName != null) ? this.codecExtension : "";
        }
 
+       private void initializeCompressionCodec() {
+               if (hadoopCodec == null) {
+                       Configuration conf = new Configuration();
+
+                       for (Map.Entry<String, String> entry : 
hadoopConfigurationMap.entrySet()) {
+                               conf.set(entry.getKey(), entry.getValue());
+                       }
+
+                       hadoopCodec = new 
CompressionCodecFactory(conf).getCodecByName(this.hadoopCodecName);
+               }
+       }
 
 Review comment:
   In general why not separating this `copyToMap` and `loadFromMap` logic in 
two static methods like:
   
   ```
   private static Configuration loadHadoopConfigFromMap(Map<String, String> 
hadoopConfigMap) {
                checkNotNull(hadoopConfigMap);
   
                Configuration hadoopConfig = new Configuration();
                for (Map.Entry<String, String> entry : 
hadoopConfigMap.entrySet()) {
                        hadoopConfig.set(entry.getKey(), entry.getValue());
                }
                return hadoopConfig;
        }
   
        private static void copyHadoopConfigToMap(Configuration hadoopConfig, 
Map<String, String> hadoopConfigMap) {
                checkNotNull(hadoopConfig);
                checkNotNull(hadoopConfigMap);
   
                for (Map.Entry<String, String> entry : hadoopConfig) {
                        hadoopConfigMap.put(entry.getKey(), entry.getValue());
                }
        }
   ```
   
   I believe that this will make each method self-contained and with a simple 
task, which I find as a good practice. Also, the `initializeCompressionCodec` 
will become simply:
   
   ```
   private void initializeCompressionCodec() {
                if (hadoopCodec == null) {
                        Configuration hadoopConfig = 
loadHadoopConfigFromMap(hadoopConfigurationMap);
                        hadoopCodec = new 
CompressionCodecFactory(hadoopConfig).getCodecByName(this.hadoopCodecName);
                }
        }
   ``` 
   
   and the `withHadoopCompression` will also be simplified.
   
   WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] kl0u commented on a change in pull request #11307: [FLINK-16371] [BulkWriter] Fix Hadoop Compression BulkWriter

Reply via email to