reduce quite slow

shangan Tue, 17 Aug 2010 02:47:24 -0700

I did a "select count(*) from", it's quite slow and I try to set 
mapred.reduce.tasks higher, but the reduce task turn out always unchanged and 
remain to 1(I can see it in the mapreduce administrator Web UI).
It seems the map process is quite fast but the reduce process is quite slow, 
the phenomenon is that the reduce can jump to 20% in a short time but then will 
block there for quite a long time, the following is the actuall process:


    > select count(1) from log_game_farm_goods;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201008171643_0001, Tracking URL = 
http://vm153:50030/jobdetails.jsp?jobid=job_201008171643_0001
Kill Command = /home/shangan/bin/hadoop-0.20.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=vm153:9001 -kill job_201008171643_0001
2010-08-17 16:44:27,100 Stage-1 map = 0%,  reduce = 0%
2010-08-17 16:44:34,146 Stage-1 map = 40%,  reduce = 0%
2010-08-17 16:44:38,169 Stage-1 map = 60%,  reduce = 0%
2010-08-17 16:44:40,189 Stage-1 map = 100%,  reduce = 0%
2010-08-17 16:44:43,210 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:45:43,637 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:46:44,060 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:47:44,455 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:48:44,835 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:49:45,170 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:50:45,446 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:51:45,790 Stage-1 map = 100%,  reduce = 13%
2010-08-17 16:52:42,033 Stage-1 map = 100%,  reduce = 20%
2010-08-17 16:52:51,070 Stage-1 map = 100%,  reduce = 27%
2010-08-17 16:53:51,410 Stage-1 map = 100%,  reduce = 27%
2010-08-17 16:54:51,669 Stage-1 map = 100%,  reduce = 27%
2010-08-17 16:55:51,933 Stage-1 map = 100%,  reduce = 27%
2010-08-17 16:56:52,321 Stage-1 map = 100%,  reduce = 27%
2010-08-17 16:57:52,605 Stage-1 map = 100%,  reduce = 27%
2010-08-17 16:58:52,913 Stage-1 map = 100%,  reduce = 27%
2010-08-17 16:59:53,168 Stage-1 map = 100%,  reduce = 27%
2010-08-17 17:00:53,452 Stage-1 map = 100%,  reduce = 27%
2010-08-17 17:01:53,764 Stage-1 map = 100%,  reduce = 27%
2010-08-17 17:02:54,008 Stage-1 map = 100%,  reduce = 27%
2010-08-17 17:03:09,085 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201008171643_0001
OK
24757
Time taken: 1128.692 seconds


My cluster consists of only 4 nodes :1 namenode and 3 datanodes, it seems that 
the network speed is not quite good,there's too many fetch-failures in the 
mapreduce running. And I think this is the reason,if so can anyone tell me the 
request for network status, otherwise tell me the reason. Thanks a lot!

2010-08-17 



shangan

reduce quite slow

Reply via email to