I use something like this: bin/hadoop -getmerge <output-directory> | sort +1n
This wokrs very well because the final counts are relatively small compared to the original input. There is nothing that says you can't mix MR programming with conventional code. On 9/19/07 3:42 PM, "Ross Boucher" <[EMAIL PROTECTED]> wrote: > This problem seems to have gone away by itself. > > Now I have my job running, but I'm not entirely sure how to get the > output into something useful to me. > > I've counting word frequencies, and I would like the output sorted by > frequency, rather than alphabetically. I would also like the final > output to be in one file, though I'm not sure if this is possible > given that its computed separately. I suppose it wouldn't be too > difficult to post process the files to get them sorted the way I > would like and in one file, but if anyone has some tips on how to do > this in my job itself, that would be great. > > Thanks. > > Ross Boucher > [EMAIL PROTECTED] > > > On Sep 19, 2007, at 2:59 PM, Owen O'Malley wrote: > >> >> On Sep 19, 2007, at 2:30 PM, Ross Boucher wrote: >> >>> Specifically, the job starts, and then each task that is scheduled >>> fails, with the following error: >>> >>> Error initializing task_0007_m_000063_0: >>> java.io.IOException: /DFS_ROOT/tmp/mapred/system/submit_i849v1/ >>> job.xml: No such file or directory >> >> Look at the configuration of your mapred.system.dir. It MUST be the >> same on both the cluster and submitting node. Note that >> mapred.system.dir must be in the default file system, which must >> also be the same on the cluster and submitting node. Note that >> there is a jira (HADOOP-1100) that would have the cluster pass the >> system directory to the client, which would get rid of this issue. >> >> -- Owen >
