Re: Working with the output files of a hadoop application

Calvin Yu Wed, 15 Aug 2007 06:54:33 -0700

The manual way is to copy the split files to your local filesystem
using 'hadoop fs -copyToLocal'.  You could also write code to read
that data from hdfs.


What I do is set the reduced output to be in SequencedFile format, and
then create a new SequenceFile.Reader to read the split files from
hdfs.

Calvin


On 8/15/07, Jeroen Verhagen <[EMAIL PROTECTED]> wrote:
> Hi Sebastien,
>
> On 8/14/07, Sebastien Rainville <[EMAIL PROTECTED]> wrote:
> >
> > I am new to Hadoop. Looking at the documentation, I figured out how to
> > write map and reduce functions but now I'm stuck... How do we work with
> > the output file produced by the reducer? For example, the word count
> > example produces a file with words as keys and the number of occurrences
> > of each word as the values. Now, let's say I want to get the total
> > number of words by analyzing the output file... how I am supposed to do
> > it?
>
> I asked a similar question some time ago and haven't had any response
> sofar so I hope you will get one.
>
> Regarding your particular question, assuming each line in the output
> files contains exactly one word, counting the number of lines in the
> output files would give the answer you're looking for.
>
> But if you're looking for the count of particular word, I wonder if
> scanning through the output files for a line that starts with the word
> you're looking for is such an efficient solution.
>
> --
>
> regards,
>
> Jeroen
>

Re: Working with the output files of a hadoop application

Reply via email to