I modified the DataJoinJob.createDataJoinJob() slightly:
    if (args[7].compareToIgnoreCase("text") != 0) {
        SequenceFileOutputFormat.setOutputCompressionType(job,
            SequenceFile.CompressionType.BLOCK);
    }
    job.setBoolean("mapred.output.compress", false);

But I still see non-text output:
output/part-00000.deflate

'hadoop fs -text output/part-00000.deflate' doesn't show readable text
either.

On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> I can run the sample (I created the input files according to
> contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt):
>
> [r...@tyu-linux datajoin]# pwd
> /opt/ks/hadoop-0.20.2/build/contrib/datajoin
> [r...@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar
> hadoop-0.20.2-datajoin-examples.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> Using TextInputFormat: Text
> Using TextOutputFormat: Text
> 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> Job job_local_0001 is submitted
> Job job_local_0001 is still running.
> 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1
> 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100
> 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720
> 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680
> 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output
> 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    6
> totalCount      6
>
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_m_000000_0' done.
> 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1
> 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100
> 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
> 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680
> 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output
> 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    5
> totalCount      5
>
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_m_000001_0' done.
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments
> 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with 2
> segments left of total size: 939 bytes
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 10/03/29 09:01:32 INFO datajoin.job: key: A.a11 this.largestNumOfValues: 3
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> attempt_local_0001_r_000000_0 is allowed to commit now
> 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task
> 'attempt_local_0001_r_000000_0' to
> file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount    5
> collectedCount  7
> groupCount      6
>  > reduce
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_r_000000_0' done.
> [r...@tyu-linux datajoin]# date
> Mon Mar 29 09:02:37 PDT 2010
>
> It took a minute between the last INFO log and exit of DataJoinJob.
>
> Cheers
>
>
> On Mon, Mar 29, 2010 at 8:26 AM, M B <machac...@gmail.com> wrote:
>
>> Sorry, I should have mentioned that I tried that as well and it also gives
>> an error:
>>
>> $ <p...@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
>> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
>> datajoin/output Text 1
>> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
>> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
>> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
>> Exception in thread "main" java.io.IOException: Error opening job jar:
>> -libjars
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
>> Caused by: java.util.zip.ZipException: error in opening zip file
>>        at java.util.zip.ZipFile.open(Native Method)
>>        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
>>        at java.util.jar.JarFile.<init>(JarFile.java:133)
>>        at java.util.jar.JarFile.<init>(JarFile.java:70)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
>> Has something changed or is my environment not set up correctly?
>>  Appreciate
>> any help.
>>
>>
>>
>> On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > Then use the syntax given by
>> >
>> >
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
>> > :
>> >
>> > $ bin/hadoop jar -libjars ./samplejoin.jar
>> > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
>> >
>> > On Fri, Mar 26, 2010 at 5:10 PM, M B <machac...@gmail.com> wrote:
>> >
>> > > Sorry, but where exactly do I include the libjars option?  I tried to
>> put
>> > > it
>> > > where you stated (after the DataJoinJob class), but it just comes back
>> > with
>> > > usage information (as if the option is not valid):
>> > > $ <p...@hadoop01:~/hadoop_tests$> hadoop jar
>> >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
>> > ./samplejoin.jar
>> > > datajoin/input datajoin/output Text 1
>> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
>> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
>> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
>> > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
>> numofParts
>> > > mapper_class reducer_class map_output_value_class output_value_class
>> > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
>> > >
>> > > It seems like it's not taking the option for some reason, like it's
>> > failing
>> > > an argument check in DataJoinJob - does that not use the standard args
>> or
>> > > something?
>> > >
>> > >
>> > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>> > >
>> > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in
>> your
>> > > > HADOOP_CLASSPATH
>> > > >
>> > > > I think you should specify samplejoin.jar using -libjars instead of
>> > > putting
>> > > > it directly after jar command:
>> > > > hadoop jar hadoop-0.20.2-datajoin.jar
>> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
>> > > ./samplejoin.jar
>> > > > ... (same as your example)
>> > > >
>> > > > Cheers
>> > > >
>> > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <machac...@gmail.com> wrote:
>> > > >
>> > > > > I may be having a setup issue with classpaths, would appreciate
>> some
>> > > > help.
>> > > > >
>> > > > > I created a jar with all the Sample* classes in contrib/DataJoin.
>> >  Here
>> > > > is
>> > > > > the listing of my samplejoin.jar file:
>> > > > > " zip.vim version v22
>> > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
>> > > > > " Select a file with cursor and press ENTER
>> > > > > META-INF/
>> > > > > META-INF/MANIFEST.MF
>> > > > > org/
>> > > > > org/apache/
>> > > > > org/apache/hadoop/
>> > > > > org/apache/hadoop/contrib/
>> > > > > org/apache/hadoop/contrib/utils/
>> > > > > org/apache/hadoop/contrib/utils/join/
>> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
>> > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
>> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
>> > > > >
>> > > > > When I go to run this, things start to run, but every Map try
>> errors
>> > > out
>> > > > > with:
>> > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
>> > > > >
>> > > > > Here is the command:
>> > > > > hadoop jar ./samplejoin.jar
>> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
>> > > > > datajoin/input datajoin/output Text 1
>> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
>> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
>> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
>> > > > >
>> > > > > This is a new install of 0.20.2.
>> > > > >
>> > > > > HADOOP_CLASSPATH is set
>> > > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> > > > > Any help would be appreciated.
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to