Re: scaling issue, please help

Amar Kamat Thu, 03 Jul 2008 00:56:13 -0700

Mori Bellamy wrote:

i discovered that some of my code was causing out of boundsexceptions. i cleaned up that code and the map tasks seemed to work.that confuses me -- i'm pretty sure hadoop is resilient to a few maptasks failing (5 out of 13k). before this fix, my remaining 2% oftasks were getting killed.

Mori, I am not sure what the confusion is. Hadoop is resilient to fewtask failures but not by default. The parameter that does it ismapred.max.map.failures.percent and mapred.max.reduce.failures.percent.Every task internally consists of attempts (internally, for theframework). Hadoop allows some attempt failures too. If the number ofattempts that failed of a task exceeds the threshold(mapred.map.max.attempts/mapred.reduce.max.attempts : default is 4) thenthe task is considered failed. If the number of map/reduce task failuresexceeds the threshold(mapred.max.map.failures.percent/mapred.max.reduce.failures.percent :default is 0) then the job is considered failed.

Amar

On Jul 1, 2008, at 10:06 PM, Amar Kamat wrote:
Mori Bellamy wrote:
hey all,
i've got a mapreduce task that works on small (~1G) input. when itry to run the same task on large (~100G) input, i get the followingerror around when the map tasks are almost done (~98%)
2008-07-01 13:10:59,231 INFO org.apache.hadoop.mapred.ReduceTask:task_200807011005_0005_r_000000_0: Got 0 new map-outputs & 0obsolete map-outputs from tasktracker and 0 map-outputs fromprevious failures2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:task_200807011005_0005_r_000000_0 Got 0 known map outputlocation(s); scheduling...2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:task_200807011005_0005_r_000000_0 Scheduled 0 of 0 known outputs (0slow hosts and 0 dup hosts)2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:task_200807011005_0005_r_000000_0 Need 1 map output(s)
...
...
These are not error messages. The reducers are stuck as not all mapsare completed. Mori, could you let us know what is happening to theother 2% maps. Are they getting executed? Are they still pending(waiting to run)? Were they killed/failed? Is there any lost tracker?
I'm running the task on a cluster of 5 workers, one DFS master, andone task tracker.
What do you mean by 5 workers and 1 task tracker?
i'm chaining mapreduce tasks, so i'm using SequenceFileOutput andSequenceFileInput. this error happens before the first link in thechain sucessfully reduces.
Can you elaborate this a bit. Are you chaining MR jobs?
Amar
does anyone have any insight? thanks!

Re: scaling issue, please help

Reply via email to