[
https://issues.apache.org/jira/browse/HADOOP-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521538
]
Owen O'Malley commented on HADOOP-1694:
---------------------------------------
No, Nigel sent the output of word count, so that is just part of the compressed
file that got interpreted as "words". The problem, I suspect that that the lzo
file extension is not in the default config.
> lzo compressed input files not properly recognized
> --------------------------------------------------
>
> Key: HADOOP-1694
> URL: https://issues.apache.org/jira/browse/HADOOP-1694
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.14.0
> Reporter: Nigel Daley
> Fix For: 0.15.0
>
>
> When running the wordcount example with text, gzip and lzo compressed input
> files, the lzo compressed input files are not properly recognized and are
> treated as text files.
> With an input dir of
> {quote}
> /user/hadoopqa/input/part-001.txt
> /user/hadoopqa/input/part-002.txt.gz
> /user/hadoopqa/input/part-003.txt.lzo
> {quote}
> and running this command
> {quote}
> bin/hadoopqa jar hadoop-examples.jar wordcount /user/hadoopqa/input
> /user/hadoopqa/output
> {quote}
> I get output that looks like
> {quote}
> row 4
> royal 4
> rt$3-ex?ÔøΩ?÷µIStÔøΩ"4D%ÔøΩ9$UÔøΩÔøΩ"ÔøΩ, 1
> ru$ÔøΩÔøΩ#~t"@ÔøΩm*d#\/$ÔøΩÔøΩl.t"XÔøΩÔøΩDi" 1
> rubbÔøΩdÔøΩ&@bT 1
> rubbed 2
> {quote}
> To lzo compress the file I used lzop:
> http://www.lzop.org/download/lzop-1.01-linux_i386.tar.gz
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.