Greg, > Does anybody know whether or not speculative execution works with Hadoop > streaming? > > If so, I have a script that does not appear to ever launch redundant mappers > for the slow performers. This may be due to the fact that each mapper > quickly reports (inaccurately) that it is 100% complete. I am using the > NLineInputFormat and each mapper gets 17 lines of input. Each line requires > a lot of computation. It appears that all 17 lines immediately get counted > as being processed early on. Is there anyway to report or force accurate > completion stats? Could this explain why speculative execution never gets > triggered? >
I am wondering if you are hitting https://issues.apache.org/jira/browse/MAPREDUCE-1073. In M/R pipes jobs, the map task progress moves to 100% as soon as the input is read, because the processing happens asynchronously. As Sreekanth notes, this would result in speculation not working as expected. Thanks Hemanth
