[jira] Commented: (HADOOP-4640) Add ability to split text files compressed with lzo

Chris Douglas (JIRA) Tue, 18 Nov 2008 18:28:06 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648887#action_12648887
 ]


Chris Douglas commented on HADOOP-4640:
---------------------------------------

bq. As for the close() I did as suggested, although it rubs me the wrong way to 
read all those bytes without needing to. I guess the practical performance 
impact will be minimal though.
It's only calculating a checksum of the remaining bytes from a direct buffer. 
For the default 64k block, I'd guess it adds somewhere between 20 and 50ms in 
the close. If it had to make another trip to the native code, I agree that 
would be improper, but this should be a trivial cost. 

I'm not sure I follow LzoIndex::findIndexPosition. Given {{\{0, 5, 10, 15\}}} 
as block positions, findIndexPosition(1) will return 10, but 
findIndexPosition(5) returns 5. Should the former case also return 5? 
findIndexPosition(11) returns -1, which also seems contrary to its javadoc 
explanation.

> Add ability to split text files compressed with lzo
> ---------------------------------------------------
>
>                 Key: HADOOP-4640
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4640
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io, mapred
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>            Priority: Trivial
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4640.patch, HADOOP-4640.patch, HADOOP-4640.patch
>
>
> Right now any file compressed with lzop will be processed by one mapper. This 
> is a shame since the lzo algorithm would be very suitable for large log files 
> and similar common hadoop data sets. The compression rate is not the best out 
> there but the decompression speed is amazing.  Since lzo writes compressed 
> data in blocks it would be possible to make an input format that can split 
> the files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4640) Add ability to split text files compressed with lzo

Reply via email to