hey guys,
i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs
daemons), and observing that the map phase executes really quickly but the
reduce phase is really slow. the application is simply to read some log files,
whose lines constitute of key-value pairs, and summarize based on the keys,
summing the values... so this seems like an ideal application of hadoop.
could you suggest where the bottleneck might be? by logging, i observed that
it is not in my reducer implementation. could it be in the RPC? or the sort or
copying phases?
would there be any certain properties that should be tweaked?
thanks and best regards,
mw
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
Check outnew cars at Yahoo! Autos.