On 4/8/08 10:43 AM, "Natarajan, Senthil" <[EMAIL PROTECTED]> wrote:
> I would like to try using Hadoop.
That is good for education, probably bad for run time. It could take
SECONDS longer to run (oh my).
> Do you mean to write another MapReduce program which takes the output of the
> first MapReduce (the already existing file of this format)
Yes.
> And use count as the key and IP Address as the value.
Yes.
> Is it possible to do this in the same program instead of writing another one.
No.
> If it is not possible, is it something available in Hadoop once the first
> program is done, can I call Second program to do the sorting.
Yes. If you are using Java, just create a second configuration and do the
same thing as you did the first time to run the program.
> If I set the number of reducer to 1, then it will take more time to reduce all
> the maps and hence affect the performance right?
Not really. Most of the sorting work will be done by the mappers. The
reducer will only be merging the data so it will be pretty fast. The
largest cost will be startup time, the second largest will be network
transfer time.