slowness in hadoop reduce phase when using distributed mode

2007-05-03 Thread moonwatcher
hey guys, i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs daemons), and observing that the map phase executes really quickly but the reduce phase is really slow. the application is simply to read some log files, whose lines constitute of key-value pairs, and

Re: slowness in hadoop reduce phase when using distributed mode

2007-05-03 Thread Doug Cutting
What version of Hadoop are you using? On what sort of a cluster? How big is your dataset? Doug moonwatcher wrote: hey guys, i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs daemons), and observing that the map phase executes really quickly but the reduce phase

Re: slowness in hadoop reduce phase when using distributed mode

2007-05-03 Thread moonwatcher
i am using hadoop-0.12.3. i am using a single node cluster -- running hdfs daemons, job tracker, task tracker. the dataset is about 12 MB of log files. other information: on the map phase, cpu went really high, close to 100%. on the reduce phase, cpu was near zero, usually 1% or