[ 
https://issues.apache.org/jira/browse/HADOOP-19167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844925#comment-17844925
 ] 

ASF GitHub Bot commented on HADOOP-19167:
-----------------------------------------

skyskyhu opened a new pull request, #6807:
URL: https://github.com/apache/hadoop/pull/6807

   [HADOOP-19167](https://issues.apache.org/jira/browse/HADOOP-19167) Change of 
Codec configuration does not work
   
   ### Description of PR
   In one of my projects, I need to dynamically adjust compression level for 
different files. 
   However, I found that in most cases the new compression level does not take 
effect as expected, the old compression level continues to be used.
   Here is the relevant code snippet:
   ```
   ZStandardCodec zStandardCodec = new ZStandardCodec();
   zStandardCodec.setConf(conf);
   conf.set("io.compression.codec.zstd.level", "5"); // level may change 
dynamically
   conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
   writer = SequenceFile.createWriter(conf, 
SequenceFile.Writer.file(sequenceFilePath),
                                   
SequenceFile.Writer.keyClass(LongWritable.class),
                                   
SequenceFile.Writer.valueClass(BytesWritable.class),
                                   
SequenceFile.Writer.compression(CompressionType.BLOCK));
   ```
   
   Take my unit test as another example:
   ```
       DefaultCodec codec1 = new DefaultCodec();
       Configuration conf = new Configuration();
       ZlibFactory.setCompressionLevel(conf, CompressionLevel.TWO);
       codec1.setConf(conf);
       Compressor comp1 = CodecPool.getCompressor(codec1);
       CodecPool.returnCompressor(comp1);
   
       DefaultCodec codec2 = new DefaultCodec();
       Configuration conf2 = new Configuration();
       CompressionLevel newCompressionLevel = CompressionLevel.THREE;
       ZlibFactory.setCompressionLevel(conf2, newCompressionLevel);
       codec2.setConf(conf2);
       Compressor comp2 = CodecPool.getCompressor(codec2);
   ```
   In the current code, the compression level of comp2 is 2, rather than the 
intended level of 3.
   
   The reason is SequenceFile.Writer.init() method will call 
CodecPool.getCompressor(codec) to get a compressor, eventually 
CodecPool.getCompressor(codec, null) will be called.
   If the compressor is a reused instance, the conf is not applied because it 
is passed as null:
   ```
   public static Compressor getCompressor(CompressionCodec codec, Configuration 
conf) {
   Compressor compressor = borrow(compressorPool, codec.getCompressorType());
   if (compressor == null) {
     compressor = codec.createCompressor(); 
     LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]"); 
   } else {
   compressor.reinit(conf);   //conf is null here
   ......
   ```
   
   Please also refer to my unit test to reproduce the bug. 
   To address this bug, I modified the code to ensure that the configuration is 
read back from the codec when a compressor is reused.
   
   ### How was this patch tested?
   unit test 




> Change of Codec configuration does not work
> -------------------------------------------
>
>                 Key: HADOOP-19167
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19167
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: compress
>            Reporter: Zhikai Hu
>            Priority: Minor
>              Labels: pull-request-available
>
> In one of my projects, I need to dynamically adjust compression level for 
> different files. 
> However, I found that in most cases the new compression level does not take 
> effect as expected, the old compression level continues to be used.
> Here is the relevant code snippet:
> ZStandardCodec zStandardCodec = new ZStandardCodec();
> zStandardCodec.setConf(conf);
> conf.set("io.compression.codec.zstd.level", "5"); // level may change 
> dynamically
> conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
> writer = SequenceFile.createWriter(conf, 
> SequenceFile.Writer.file(sequenceFilePath),
>                                 
> SequenceFile.Writer.keyClass(LongWritable.class),
>                                 
> SequenceFile.Writer.valueClass(BytesWritable.class),
>                                 
> SequenceFile.Writer.compression(CompressionType.BLOCK));
> The reason is SequenceFile.Writer.init() method will call 
> CodecPool.getCompressor(codec, null) to get a compressor. 
> If the compressor is a reused instance, the conf is not applied because it is 
> passed as null:
> public static Compressor getCompressor(CompressionCodec codec, Configuration 
> conf) {
> Compressor compressor = borrow(compressorPool, codec.getCompressorType());
> if (compressor == null)
> { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor 
> ["+codec.getDefaultExtension()+"]"); }
> else {
> compressor.reinit(conf);   //conf is null here
> ......
>  
> Please also refer to my unit test to reproduce the bug. 
> To address this bug, I modified the code to ensure that the configuration is 
> read back from the codec when a compressor is reused.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to