i am using hadoop-0.12.3.
i am using a single node cluster -- running hdfs daemons, job tracker, task
tracker.
the dataset is about 12 MB of log files.
other information:
on the map phase, cpu went really high, close to 100%.
on the reduce phase, cpu was near zero, usually 1% or 2%. but the reduce
phase did complete eventually, and produced the correct output. this is
consistent behaviour.
thanks,
mw
Doug Cutting <[EMAIL PROTECTED]> wrote: What version of Hadoop are you using?
On what sort of a cluster? How
big is your dataset?
Doug
moonwatcher wrote:
> hey guys,
>
> i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs
> daemons), and observing that the map phase executes really quickly but the
> reduce phase is really slow. the application is simply to read some log
> files, whose lines constitute of key-value pairs, and summarize based on the
> keys, summing the values... so this seems like an ideal application of
> hadoop.
>
> could you suggest where the bottleneck might be? by logging, i observed that
> it is not in my reducer implementation. could it be in the RPC? or the sort
> or copying phases?
> would there be any certain properties that should be tweaked?
>
> thanks and best regards,
> mw
>
>
>
> ---------------------------------
> Ahhh...imagining that irresistible "new car" smell?
> Check outnew cars at Yahoo! Autos.
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
Check outnew cars at Yahoo! Autos.