Hi,

All of the map jobs jump to 100% completion within seconds of launching the 
job. About 95% of the mappers actually finish within a minute. The problem is 
that sometimes there will be 2-3 mappers that take 20 minutes to complete (the 
input sizes are roughly the same for each mapper). Speculative execution should 
help in this case, but the redundant mappers are not launched for these 
stragglers. Strangely enough, I have noticed that some redundant tasks are 
launched for tasks that don't really need it. My guess would be that since the 
input is so tiny (17 short lines per mapper), streaming just reads all 17 lines 
and immediately reports 100% completion. In my case, it would work best if it 
worked like this: read one line, process, report (100/17)% complete, read one 
line, process, report (200/17)%, etc.

Thanks,
Greg Lawrence

On 5/27/10 9:14 PM, "Rekha Joshi" <[email protected]> wrote:

Hi Gregory,

Of what I recall, there was a discussion where it was speculated to have 
speculative execution for reducers alone, as having it for map side had 
concerns.
Though there might be two config params now if you want to do it - 
mapred.reduce.tasks.speculative.execution/ 
mapred.map.tasks.speculative.execution

On reporting side, when you say its incorrect, by what margin?like how many map 
jobs are still in running state when 100% is reported?
I think there was a marginal error on rounding side sometime..Can you verify if 
its a side-effect of speculative by trying the run without speculative mode on?

Cheers,
/R

On 5/28/10 2:07 AM, "Gregory Lawrence" <[email protected]> wrote:

Hi,

Does anybody know whether or not speculative execution works with Hadoop 
streaming?

If so, I have a script that does not appear to ever launch redundant mappers 
for the slow performers. This may be due to the fact that each mapper quickly 
reports (inaccurately) that it is 100% complete. I am using the 
NLineInputFormat and each mapper gets 17 lines of input. Each line requires a 
lot of computation. It appears that all 17 lines immediately get counted as 
being processed early on. Is there anyway to report or force accurate 
completion stats? Could this explain why speculative execution never gets 
triggered?

Thanks,
Greg Lawrence

Reply via email to