What version of Hadoop are you using? On what sort of a cluster? How
big is your dataset?
Doug
moonwatcher wrote:
hey guys,
i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs daemons), and observing that the map phase executes really quickly but the reduce phase is really slow. the application is simply to read some log files, whose lines constitute of key-value pairs, and summarize based on the keys, summing the values... so this seems like an ideal application of hadoop.
could you suggest where the bottleneck might be? by logging, i observed that it is not in my reducer implementation. could it be in the RPC? or the sort or copying phases?
would there be any certain properties that should be tweaked?
thanks and best regards,
mw
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
Check outnew cars at Yahoo! Autos.