[ https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659621#comment-13659621 ]
Arun C Murthy commented on MAPREDUCE-4366: ------------------------------------------ Sorry, I've had a hard time coming around to this. {quote} There didn't seem to be a clear definition of speculative(Map|Reduce)Tasks, so the one I came up with is that the number of speculative(Map|Reduce)Tasks is the number of attempts running that are not on the critical path of the job completing. This makes sense in the context of computing pending(Map|Reduce)s, which is the only place the variable is used. {quote} Thanks for the explanation. The definition of speculative(Map|Reduce)Tasks, at least in my head, has been the number of task-attempts have an alternate... no, it's not a great one, or a documented one! *smile* However, this has been the basis for a number of assumptions related to computing pending tasks etc. in various schedulers. (See call hierarchy for JIP.pendingTasks). Since your change re-defines this, I'm afraid it breaks schedulers e.g. CapacityScheduler. Hence, I'm against the change. I fully agree it isn't ideal, but I'd rather not make invasive changes in MR1 - the JT/JIP/Scheduler nexus scares me a lot... in fact, I'm officially terrified of it! *smile* Now, to get around the metrics problem, how about making a more local change in JIP.garbageCollect? An option is to just call decWaiting(Maps|Reduces) in JIP.garbageCollect with JIP.num(Maps|Reduces)... currently if you follow the opposite side i.e addWaiting(Maps|Reduces), they are just static and are done at JIP.initTasks with num(Maps|Reduces). That would solve the immediate problem at hand? Thoughts? ---- Thanks again for checking in with me, and being patient in working through the mess we have! > mapred metrics shows negative count of waiting maps and reduces > --------------------------------------------------------------- > > Key: MAPREDUCE-4366 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 1.0.2 > Reporter: Thomas Graves > Assignee: Sandy Ryza > Attachments: MAPREDUCE-4366-branch-1-1.patch, > MAPREDUCE-4366-branch-1.patch > > > Negative waiting_maps and waiting_reduces count is observed in the mapred > metrics. MAPREDUCE-1238 partially fixed this but it appears there is still > issues as we are seeing it, but not as bad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira