[ https://issues.apache.org/jira/browse/HADOOP-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646985#action_12646985 ]
Doug Cutting commented on HADOOP-4640: -------------------------------------- > What is our policy on this? I don't know that we have a clear policy. In this case, I think it would be fine for the tests to succeed with a warning if native code is not available. Ideally we should have tests that are only run when native code is available. A few questions: - Should the InputFormat require the index, as in your patch, or rather should it degrade gracefully, so that if indexes do not exist it creates a single split per file? - It would be great to have an OutputFormat that creates indexes as files are written. Is that possible? > Add ability to split text files compressed with lzo > --------------------------------------------------- > > Key: HADOOP-4640 > URL: https://issues.apache.org/jira/browse/HADOOP-4640 > Project: Hadoop Core > Issue Type: Improvement > Components: io, mapred > Reporter: Johan Oskarsson > Assignee: Johan Oskarsson > Priority: Trivial > Fix For: 0.20.0 > > Attachments: HADOOP-4640.patch > > > Right now any file compressed with lzop will be processed by one mapper. This > is a shame since the lzo algorithm would be very suitable for large log files > and similar common hadoop data sets. The compression rate is not the best out > there but the decompression speed is amazing. Since lzo writes compressed > data in blocks it would be possible to make an input format that can split > the files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.