skyskyhu opened a new pull request, #6807: URL: https://github.com/apache/hadoop/pull/6807
[HADOOP-19167](https://issues.apache.org/jira/browse/HADOOP-19167) Change of Codec configuration does not work ### Description of PR In one of my projects, I need to dynamically adjust compression level for different files. However, I found that in most cases the new compression level does not take effect as expected, the old compression level continues to be used. Here is the relevant code snippet: ``` ZStandardCodec zStandardCodec = new ZStandardCodec(); zStandardCodec.setConf(conf); conf.set("io.compression.codec.zstd.level", "5"); // level may change dynamically conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName()); writer = SequenceFile.createWriter(conf, SequenceFile.Writer.file(sequenceFilePath), SequenceFile.Writer.keyClass(LongWritable.class), SequenceFile.Writer.valueClass(BytesWritable.class), SequenceFile.Writer.compression(CompressionType.BLOCK)); ``` Take my unit test as another example: ``` DefaultCodec codec1 = new DefaultCodec(); Configuration conf = new Configuration(); ZlibFactory.setCompressionLevel(conf, CompressionLevel.TWO); codec1.setConf(conf); Compressor comp1 = CodecPool.getCompressor(codec1); CodecPool.returnCompressor(comp1); DefaultCodec codec2 = new DefaultCodec(); Configuration conf2 = new Configuration(); CompressionLevel newCompressionLevel = CompressionLevel.THREE; ZlibFactory.setCompressionLevel(conf2, newCompressionLevel); codec2.setConf(conf2); Compressor comp2 = CodecPool.getCompressor(codec2); ``` In the current code, the compression level of comp2 is 2, rather than the intended level of 3. The reason is SequenceFile.Writer.init() method will call CodecPool.getCompressor(codec) to get a compressor, eventually CodecPool.getCompressor(codec, null) will be called. If the compressor is a reused instance, the conf is not applied because it is passed as null: ``` public static Compressor getCompressor(CompressionCodec codec, Configuration conf) { Compressor compressor = borrow(compressorPool, codec.getCompressorType()); if (compressor == null) { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]"); } else { compressor.reinit(conf); //conf is null here ...... ``` Please also refer to my unit test to reproduce the bug. To address this bug, I modified the code to ensure that the configuration is read back from the codec when a compressor is reused. ### How was this patch tested? unit test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
