hey guys,
i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs
daemons), and observing that the map phase executes really quickly but the
reduce phase is really slow. the application is simply to read some log files,
whose lines constitute of key-value pairs, and
What version of Hadoop are you using? On what sort of a cluster? How
big is your dataset?
Doug
moonwatcher wrote:
hey guys,
i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs daemons), and observing that the map phase executes really quickly but the reduce phase
i am using hadoop-0.12.3.
i am using a single node cluster -- running hdfs daemons, job tracker, task
tracker.
the dataset is about 12 MB of log files.
other information:
on the map phase, cpu went really high, close to 100%.
on the reduce phase, cpu was near zero, usually 1% or