I am not sure if you could do this in one job, since you have to do two sorts. You could run another round of map/reduce job by using InverseMapper and IdentityReducer (part of org.apache.hadoop.mapred.lib package). Calling setNumReduceTasks to set num of reduce tasks to 1 can give you one final output.
For you original questin of missing file. I am not sure if this helps, but I had similar problem for missing jar file before because I ran multiple jobs in a program and reused the JobConf object without resetting job's jar file (setJarByClass), I think what happened was that hadoop would reset the location of the jar file to the tmp directory every time you run a job. Eric Zhang Vespa content @Yahoo! Work: 408-349-2466 -----Original Message----- From: Ross Boucher [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 3:43 PM To: [email protected] Subject: Re: Running Custom Job This problem seems to have gone away by itself. Now I have my job running, but I'm not entirely sure how to get the output into something useful to me. I've counting word frequencies, and I would like the output sorted by frequency, rather than alphabetically. I would also like the final output to be in one file, though I'm not sure if this is possible given that its computed separately. I suppose it wouldn't be too difficult to post process the files to get them sorted the way I would like and in one file, but if anyone has some tips on how to do this in my job itself, that would be great. Thanks. Ross Boucher [EMAIL PROTECTED] On Sep 19, 2007, at 2:59 PM, Owen O'Malley wrote: > > On Sep 19, 2007, at 2:30 PM, Ross Boucher wrote: > >> Specifically, the job starts, and then each task that is scheduled >> fails, with the following error: >> >> Error initializing task_0007_m_000063_0: >> java.io.IOException: /DFS_ROOT/tmp/mapred/system/submit_i849v1/ >> job.xml: No such file or directory > > Look at the configuration of your mapred.system.dir. It MUST be the > same on both the cluster and submitting node. Note that > mapred.system.dir must be in the default file system, which must > also be the same on the cluster and submitting node. Note that > there is a jira (HADOOP-1100) that would have the cluster pass the > system directory to the client, which would get rid of this issue. > > -- Owen
