skyskyhu opened a new pull request, #6807:
URL: https://github.com/apache/hadoop/pull/6807

   [HADOOP-19167](https://issues.apache.org/jira/browse/HADOOP-19167) Change of 
Codec configuration does not work
   
   ### Description of PR
   In one of my projects, I need to dynamically adjust compression level for 
different files. 
   However, I found that in most cases the new compression level does not take 
effect as expected, the old compression level continues to be used.
   Here is the relevant code snippet:
   ```
   ZStandardCodec zStandardCodec = new ZStandardCodec();
   zStandardCodec.setConf(conf);
   conf.set("io.compression.codec.zstd.level", "5"); // level may change 
dynamically
   conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
   writer = SequenceFile.createWriter(conf, 
SequenceFile.Writer.file(sequenceFilePath),
                                   
SequenceFile.Writer.keyClass(LongWritable.class),
                                   
SequenceFile.Writer.valueClass(BytesWritable.class),
                                   
SequenceFile.Writer.compression(CompressionType.BLOCK));
   ```
   
   Take my unit test as another example:
   ```
       DefaultCodec codec1 = new DefaultCodec();
       Configuration conf = new Configuration();
       ZlibFactory.setCompressionLevel(conf, CompressionLevel.TWO);
       codec1.setConf(conf);
       Compressor comp1 = CodecPool.getCompressor(codec1);
       CodecPool.returnCompressor(comp1);
   
       DefaultCodec codec2 = new DefaultCodec();
       Configuration conf2 = new Configuration();
       CompressionLevel newCompressionLevel = CompressionLevel.THREE;
       ZlibFactory.setCompressionLevel(conf2, newCompressionLevel);
       codec2.setConf(conf2);
       Compressor comp2 = CodecPool.getCompressor(codec2);
   ```
   In the current code, the compression level of comp2 is 2, rather than the 
intended level of 3.
   
   The reason is SequenceFile.Writer.init() method will call 
CodecPool.getCompressor(codec) to get a compressor, eventually 
CodecPool.getCompressor(codec, null) will be called.
   If the compressor is a reused instance, the conf is not applied because it 
is passed as null:
   ```
   public static Compressor getCompressor(CompressionCodec codec, Configuration 
conf) {
   Compressor compressor = borrow(compressorPool, codec.getCompressorType());
   if (compressor == null) {
     compressor = codec.createCompressor(); 
     LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]"); 
   } else {
   compressor.reinit(conf);   //conf is null here
   ......
   ```
   
   Please also refer to my unit test to reproduce the bug. 
   To address this bug, I modified the code to ensure that the configuration is 
read back from the codec when a compressor is reused.
   
   ### How was this patch tested?
   unit test 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to