I use something like this:

  bin/hadoop -getmerge <output-directory> | sort +1n

This wokrs very well because the final counts are relatively small compared
to the original input.  There is nothing that says you can't mix MR
programming with conventional code.

On 9/19/07 3:42 PM, "Ross Boucher" <[EMAIL PROTECTED]> wrote:

> This problem seems to have gone away by itself.
> 
> Now I have my job running, but I'm not entirely sure how to get the
> output into something useful to me.
> 
> I've counting word frequencies, and I would like the output sorted by
> frequency, rather than alphabetically.  I would also like the final
> output to be in one file, though I'm not sure if this is possible
> given that its computed separately.  I suppose it wouldn't be too
> difficult to post process the files to get them sorted the way I
> would like and in one file, but if anyone has some tips on how to do
> this in my job itself, that would be great.
> 
> Thanks.
> 
> Ross Boucher
> [EMAIL PROTECTED]
> 
> 
> On Sep 19, 2007, at 2:59 PM, Owen O'Malley wrote:
> 
>> 
>> On Sep 19, 2007, at 2:30 PM, Ross Boucher wrote:
>> 
>>> Specifically, the job starts, and then each task that is scheduled
>>> fails, with the following error:
>>> 
>>> Error initializing task_0007_m_000063_0:
>>> java.io.IOException: /DFS_ROOT/tmp/mapred/system/submit_i849v1/
>>> job.xml: No such file or directory
>> 
>> Look at the configuration of your mapred.system.dir. It MUST be the
>> same on both the cluster and submitting node. Note that
>> mapred.system.dir must be in the default file system, which must
>> also be the same on the cluster and submitting node. Note that
>> there is a jira (HADOOP-1100) that would have the cluster pass the
>> system directory to the client, which would get rid of this issue.
>> 
>> -- Owen
> 

Reply via email to