I am not sure if you could do this in one job, since you have to do two
sorts. You could run another round of map/reduce job by using InverseMapper
and IdentityReducer (part of  org.apache.hadoop.mapred.lib package).
Calling setNumReduceTasks to set num of reduce tasks to 1  can give you one
final output.

For you original questin of missing file. I am not sure if this helps, but I
had similar problem for missing jar file before because I ran multiple jobs
in a program and reused the JobConf object without resetting job's jar file
(setJarByClass), I think what happened was that hadoop would reset the
location of the jar file to the tmp directory every time you run a job.  


Eric Zhang
Vespa content @Yahoo!
Work: 408-349-2466
 

-----Original Message-----
From: Ross Boucher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 3:43 PM
To: [email protected]
Subject: Re: Running Custom Job

This problem seems to have gone away by itself.

Now I have my job running, but I'm not entirely sure how to get the output
into something useful to me.

I've counting word frequencies, and I would like the output sorted by
frequency, rather than alphabetically.  I would also like the final  
output to be in one file, though I'm not sure if this is possible   
given that its computed separately.  I suppose it wouldn't be too difficult
to post process the files to get them sorted the way I would like and in one
file, but if anyone has some tips on how to do this in my job itself, that
would be great.

Thanks.

Ross Boucher
[EMAIL PROTECTED]


On Sep 19, 2007, at 2:59 PM, Owen O'Malley wrote:

>
> On Sep 19, 2007, at 2:30 PM, Ross Boucher wrote:
>
>> Specifically, the job starts, and then each task that is scheduled  
>> fails, with the following error:
>>
>> Error initializing task_0007_m_000063_0:
>> java.io.IOException: /DFS_ROOT/tmp/mapred/system/submit_i849v1/ 
>> job.xml: No such file or directory
>
> Look at the configuration of your mapred.system.dir. It MUST be the  
> same on both the cluster and submitting node. Note that  
> mapred.system.dir must be in the default file system, which must  
> also be the same on the cluster and submitting node. Note that  
> there is a jira (HADOOP-1100) that would have the cluster pass the  
> system directory to the client, which would get rid of this issue.
>
> -- Owen


Reply via email to