[ https://issues.apache.org/jira/browse/HADOOP-19167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844925#comment-17844925 ]
ASF GitHub Bot commented on HADOOP-19167: ----------------------------------------- skyskyhu opened a new pull request, #6807: URL: https://github.com/apache/hadoop/pull/6807 [HADOOP-19167](https://issues.apache.org/jira/browse/HADOOP-19167) Change of Codec configuration does not work ### Description of PR In one of my projects, I need to dynamically adjust compression level for different files. However, I found that in most cases the new compression level does not take effect as expected, the old compression level continues to be used. Here is the relevant code snippet: ``` ZStandardCodec zStandardCodec = new ZStandardCodec(); zStandardCodec.setConf(conf); conf.set("io.compression.codec.zstd.level", "5"); // level may change dynamically conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName()); writer = SequenceFile.createWriter(conf, SequenceFile.Writer.file(sequenceFilePath), SequenceFile.Writer.keyClass(LongWritable.class), SequenceFile.Writer.valueClass(BytesWritable.class), SequenceFile.Writer.compression(CompressionType.BLOCK)); ``` Take my unit test as another example: ``` DefaultCodec codec1 = new DefaultCodec(); Configuration conf = new Configuration(); ZlibFactory.setCompressionLevel(conf, CompressionLevel.TWO); codec1.setConf(conf); Compressor comp1 = CodecPool.getCompressor(codec1); CodecPool.returnCompressor(comp1); DefaultCodec codec2 = new DefaultCodec(); Configuration conf2 = new Configuration(); CompressionLevel newCompressionLevel = CompressionLevel.THREE; ZlibFactory.setCompressionLevel(conf2, newCompressionLevel); codec2.setConf(conf2); Compressor comp2 = CodecPool.getCompressor(codec2); ``` In the current code, the compression level of comp2 is 2, rather than the intended level of 3. The reason is SequenceFile.Writer.init() method will call CodecPool.getCompressor(codec) to get a compressor, eventually CodecPool.getCompressor(codec, null) will be called. If the compressor is a reused instance, the conf is not applied because it is passed as null: ``` public static Compressor getCompressor(CompressionCodec codec, Configuration conf) { Compressor compressor = borrow(compressorPool, codec.getCompressorType()); if (compressor == null) { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]"); } else { compressor.reinit(conf); //conf is null here ...... ``` Please also refer to my unit test to reproduce the bug. To address this bug, I modified the code to ensure that the configuration is read back from the codec when a compressor is reused. ### How was this patch tested? unit test > Change of Codec configuration does not work > ------------------------------------------- > > Key: HADOOP-19167 > URL: https://issues.apache.org/jira/browse/HADOOP-19167 > Project: Hadoop Common > Issue Type: Bug > Components: compress > Reporter: Zhikai Hu > Priority: Minor > Labels: pull-request-available > > In one of my projects, I need to dynamically adjust compression level for > different files. > However, I found that in most cases the new compression level does not take > effect as expected, the old compression level continues to be used. > Here is the relevant code snippet: > ZStandardCodec zStandardCodec = new ZStandardCodec(); > zStandardCodec.setConf(conf); > conf.set("io.compression.codec.zstd.level", "5"); // level may change > dynamically > conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName()); > writer = SequenceFile.createWriter(conf, > SequenceFile.Writer.file(sequenceFilePath), > > SequenceFile.Writer.keyClass(LongWritable.class), > > SequenceFile.Writer.valueClass(BytesWritable.class), > > SequenceFile.Writer.compression(CompressionType.BLOCK)); > The reason is SequenceFile.Writer.init() method will call > CodecPool.getCompressor(codec, null) to get a compressor. > If the compressor is a reused instance, the conf is not applied because it is > passed as null: > public static Compressor getCompressor(CompressionCodec codec, Configuration > conf) { > Compressor compressor = borrow(compressorPool, codec.getCompressorType()); > if (compressor == null) > { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor > ["+codec.getDefaultExtension()+"]"); } > else { > compressor.reinit(conf); //conf is null here > ...... > > Please also refer to my unit test to reproduce the bug. > To address this bug, I modified the code to ensure that the configuration is > read back from the codec when a compressor is reused. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org