[ https://issues.apache.org/jira/browse/HADOOP-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549400 ]
Nigel Daley commented on HADOOP-1694: ------------------------------------- I'm running the job as follows with lzo library installed on the cluster: hadoop --config ~/c jar $HADOOP_HOME/hadoop-0.15-examples.jar wordcount \ -Dio.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.LzoCodec \ /user/hadoopqa/validation/data/wordCountInput \ /user/hadoopqa/validation/data/mapredWordCountOutput The map that gets the 1 .lzo file, it throws this exception: ... 07/12/07 09:38:20 INFO mapred.JobClient: map 57% reduce 0% 07/12/07 09:38:20 INFO mapred.JobClient: Task Id : task_200712070937_0001_m_000010_0, Status : FAILED java.io.EOFException at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:106) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:136) at org.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:128) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:117) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:39) at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:174) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) > lzo compressed input files not properly recognized > -------------------------------------------------- > > Key: HADOOP-1694 > URL: https://issues.apache.org/jira/browse/HADOOP-1694 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.14.0 > Reporter: Nigel Daley > Assignee: Tahir Hashmi > Fix For: 0.16.0 > > > When running the wordcount example with text, gzip and lzo compressed input > files, the lzo compressed input files are not properly recognized and are > treated as text files. > With an input dir of > {quote} > /user/hadoopqa/input/part-001.txt > /user/hadoopqa/input/part-002.txt.gz > /user/hadoopqa/input/part-003.txt.lzo > {quote} > and running this command > {quote} > bin/hadoopqa jar hadoop-examples.jar wordcount /user/hadoopqa/input > /user/hadoopqa/output > {quote} > I get output that looks like > {quote} > row 4 > royal 4 > rt$3-ex?ÔøΩ?÷µIStÔøΩ"4D%ÔøΩ9$UÔøΩÔøΩ"ÔøΩ, 1 > ru$ÔøΩÔøΩ#~t"@ÔøΩm*d#\/$ÔøΩÔøΩl.t"XÔøΩÔøΩDi" 1 > rubbÔøΩdÔøΩ&@bT 1 > rubbed 2 > {quote} > To lzo compress the file I used lzop: > http://www.lzop.org/download/lzop-1.01-linux_i386.tar.gz -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.