Here are the contents of part-00000.deflate: [r...@tyu-linux datajoin]# od output/part-00000.deflate 0000000 116170 152163 032113 162064 002164 043222 047234 111040 0000020 010713 142115 022030 006142 030426 105406 015430 000561 0000040 021511 010444 130043 115010 000032 167126 116422 0000056
Does someone know how I can see the text using hadoop command ? Thanks ---------- Forwarded message ---------- From: Ted Yu <yuzhih...@gmail.com> Date: Mon, Mar 29, 2010 at 1:27 PM Subject: Re: ClassNotFoundException with contrib/join example To: common-u...@hadoop.apache.org I modified the DataJoinJob.createDataJoinJob() slightly: if (args[7].compareToIgnoreCase("text") != 0) { SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK); } job.setBoolean("mapred.output.compress", false); But I still see output with .deflate extension: output/part-00000.deflate 'hadoop fs -text output/part-00000.deflate' doesn't show readable text either. On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yuzhih...@gmail.com> wrote: > I can run the sample (I created the input files according to > contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt): > > [r...@tyu-linux datajoin]# pwd > /opt/ks/hadoop-0.20.2/build/contrib/datajoin > [r...@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar > hadoop-0.20.2-datajoin-examples.jar > org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1 > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > Using TextInputFormat: Text > Using TextOutputFormat: Text > 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process > : 2 > Job job_local_0001 is submitted > Job job_local_0001 is still running. > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process > : 2 > 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1 > 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100 > 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720 > 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680 > 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output > 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0 > 10/03/29 09:01:32 INFO mapred.TaskRunner: > Task:attempt_local_0001_m_000000_0 is done. And is in the process of > commiting > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount 6 > totalCount 6 > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > 'attempt_local_0001_m_000000_0' done. > 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1 > 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100 > 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720 > 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680 > 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output > 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0 > 10/03/29 09:01:32 INFO mapred.TaskRunner: > Task:attempt_local_0001_m_000001_0 is done. And is in the process of > commiting > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount 5 > totalCount 5 > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > 'attempt_local_0001_m_000001_0' done. > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: > 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments > 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with 2 > segments left of total size: 939 bytes > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: > 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > 10/03/29 09:01:32 INFO datajoin.job: key: A.a11 this.largestNumOfValues: 3 > 10/03/29 09:01:32 INFO mapred.TaskRunner: > Task:attempt_local_0001_r_000000_0 is done. And is in the process of > commiting > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > attempt_local_0001_r_000000_0 is allowed to commit now > 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task > 'attempt_local_0001_r_000000_0' to > file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount 5 > collectedCount 7 > groupCount 6 > > reduce > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > 'attempt_local_0001_r_000000_0' done. > [r...@tyu-linux datajoin]# date > Mon Mar 29 09:02:37 PDT 2010 > > It took a minute between the last INFO log and exit of DataJoinJob. > > Cheers > > >