[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Chris Douglas (JIRA) Thu, 11 Sep 2008 19:54:06 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630469#action_12630469
 ]


Chris Douglas commented on HADOOP-4162:
---------------------------------------

bq. The way we use Hadoop Compression in TFile is to take each compression 
block as a separate compression stream (each block writes conclude with 
compressor.finish()). It makes no assumption of any internals of compression 
algorithm. The tests show both LZOP and LZO work fine.
LZOP works because the streams are generated by LzopCodec, which disables all 
the block checksums (assuming its target will be HDFS, which keeps its own 
checksums). In that case, the LzopDecompresor is a passthrough to 
LzoDecompressor. If someone were to pick up a LzopDecompressor and use it on a 
stream with block checksums, it would fail if that decompressor were reused to 
open a TFile. Until LzopDecompressors can be reused without errors (i.e. 
initHeaderFlags clears the checksum flags before setting them for the next 
stream), I'm \-1 on making them reusable through CodecPool.

bq. it seems that existence of LzopDecompressor is to read lzop compressed 
data. So I changed to use LZO instead of LZOP internally for TFile now.
That sounds exactly right. Unless one wants to support a the C tool, LzoCodec 
should always be preferred.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. 
> I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is 
> supposed to return the concrete Decompressor class type the specific Codec 
> class creates. In this case, LzopCodec creates LzopDecompressors and should 
> return LzopDecompressor.class. But instead, it uses the getDecompressorType() 
> method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Reply via email to