One of my task is to calculate some statistics from a very large amount of log
files for our customers. We are trying out hadoop to solve this problem.
>From what I can see, it is a perfect problem for hadoop designed to solve.
The mapper and reducer code are very straight ward. But when we try to run it
on a two node cluster, it is surprisingly slow. It has been running for three
hours and did not finish half of 250M log files. And, I am not seeing much disk
or network usage.
I want to know if there is something I should check. (Maybe some
configurations?)
Any help will be appreciated. If there is more information needed, let me know.
Thanks in advance
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
Check outnew cars at Yahoo! Autos.