[jira] [Commented] (MAPREDUCE-4013) Reduce task gets stuck when a M/R job is configured to tolerate failures

Bhallamudi Venkata Siva Kamesh (Commented) (JIRA) Mon, 26 Mar 2012 23:54:28 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239260#comment-13239260
 ]


Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-4013:
-----------------------------------------------------------

Thanks Ravi for taking a look into the patch

bq.What about the "progress of map tasks" when there are failed-maps ? Is it 
getting updated to 100% ? I see copySucceded() is updating the progress of 
map-tasks. So what happens when the last few maps fail ?

Suppose say a user has configured *mapreduce.map.failures.maxpercent* as 2, so 
job can tolerate upto  2% of map tasks failures. 
As "progress of map tasks" indicates percentage of the sucessful completion of 
map tasks, I *think* showing the actual *progress* may be more useful than 
showing 100%. 
i.e. if "progress of map tasks" indicates 99%, by this, atleast it gives an 
idea that 1% of map tasks have been failed and consequently may take action on 
that failed map tasks.

OTOH, if "progress of map tasks" should indicate the overall progress of the 
map phase, then patch needs to be updated to reflect the same.

As this has been duplicated, we can have our further discussion at 
[MAPREDUCE-3927|https://issues.apache.org/jira/browse/MAPREDUCE-3927]
                
> Reduce task gets stuck when a M/R job is configured to tolerate failures
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4013
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4013
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Amar Kamat
>            Priority: Blocker
>              Labels: shuffle
>             Fix For: 0.24.0
>
>         Attachments: MAPREDUCE-4013.patch
>
>
> When a M/R job is configured to run with some tolerance to task failures (via 
> mapreduce.map.failures.maxpercent), then the reduce task of that job gets 
> stuck in the shuffle phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4013) Reduce task gets stuck when a M/R job is configured to tolerate failures

Reply via email to