[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797552#action_12797552 ]
Amar Kamat commented on MAPREDUCE-1342: --------------------------------------- Had a brief discussion with Amareshwari on this. Looks like only JobTracker.activeTaskTrackers() and JobTracker.blacklistedTaskTrackers() are calling JobTracker.FaultyTrackerInfo.isBlacklisted() without the JobTracker lock. So extending the comment [here|https://issues.apache.org/jira/browse/MAPREDUCE-1342?focusedCommentId=12796996&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796996], I think we can do something like {code} FaultyTrackersInfo { getTaskTrackers(boolean blacklisted) { synchronized (potentiallyFaultyTrackers) { synchronized (taskTrackers) { // code that we have today JobTracker.blacklistedTaskTrackers() for (TaskTracker tt : taskTrackers.values()) { if (isBlacklisted(tt) equals blacklisted) { // add to return set } } } } } } blacklistedTaskTrackers() { return FaultyTrackersInfo.getTaskTrackers(true) } activeTaskTrackers() { return FaultyTrackersInfo.getTaskTrackers(false) } {code} Currently, activeTaskTrackers() and blacklistedTaskTrackers() differ only on the condition whether the tasktracker is faulty or not. Thoughts? > Potential JT deadlock in faulty TT tracking > ------------------------------------------- > > Key: MAPREDUCE-1342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Attachments: cycle0.png, mapreduce-1342-1.patch, > mapreduce-1342-2.patch > > > JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, > and then calls blackListTracker, which calls removeHostCapacity, which locks > JT.taskTrackers > On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then > calls faultyTrackers.isBlacklisted() which goes on to lock > potentiallyFaultyTrackers. > I haven't produced such a deadlock, but the lock ordering here is inverted > and therefore could deadlock. > Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.