Hi, Does anybody know whether or not speculative execution works with Hadoop streaming?
If so, I have a script that does not appear to ever launch redundant mappers for the slow performers. This may be due to the fact that each mapper quickly reports (inaccurately) that it is 100% complete. I am using the NLineInputFormat and each mapper gets 17 lines of input. Each line requires a lot of computation. It appears that all 17 lines immediately get counted as being processed early on. Is there anyway to report or force accurate completion stats? Could this explain why speculative execution never gets triggered? Thanks, Greg Lawrence
