[ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630469#action_12630469 ]
Chris Douglas commented on HADOOP-4162: --------------------------------------- bq. The way we use Hadoop Compression in TFile is to take each compression block as a separate compression stream (each block writes conclude with compressor.finish()). It makes no assumption of any internals of compression algorithm. The tests show both LZOP and LZO work fine. LZOP works because the streams are generated by LzopCodec, which disables all the block checksums (assuming its target will be HDFS, which keeps its own checksums). In that case, the LzopDecompresor is a passthrough to LzoDecompressor. If someone were to pick up a LzopDecompressor and use it on a stream with block checksums, it would fail if that decompressor were reused to open a TFile. Until LzopDecompressors can be reused without errors (i.e. initHeaderFlags clears the checksum flags before setting them for the next stream), I'm \-1 on making them reusable through CodecPool. bq. it seems that existence of LzopDecompressor is to read lzop compressed data. So I changed to use LZO instead of LZOP internally for TFile now. That sounds exactly right. Unless one wants to support a the C tool, LzoCodec should always be preferred. > CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. > ----------------------------------------------------------------------------- > > Key: HADOOP-4162 > URL: https://issues.apache.org/jira/browse/HADOOP-4162 > Project: Hadoop Core > Issue Type: Bug > Affects Versions: 0.18.0 > Reporter: Hong Tang > Assignee: Arun C Murthy > Fix For: 0.19.0 > > Attachments: HADOOP-4162_0_20080911.patch > > > CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. > I investigated the code, the reason seems to be the following: > LzopCodec inherits from LzoCodec. The getDecompressorType() method is > supposed to return the concrete Decompressor class type the specific Codec > class creates. In this case, LzopCodec creates LzopDecompressors and should > return LzopDecompressor.class. But instead, it uses the getDecompressorType() > method defined in the parent and returns LzoDecompressor.class. > This leads to CodecPool unable to properly recycle the decompressors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.