[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767289#action_12767289
 ] 

Hemanth Yamijala commented on MAPREDUCE-1048:
---------------------------------------------

I am seeing some issues with the 20 patch:

- The slot information is being accessed in an unsynchronized manner from the 
UI.
- There is non-atomic access of this information. IOW, the map slots and reduce 
slots are being read from the UI in different calls, and a heartbeat could 
update them in between.

Note that for the above two points, the cluster status model actually works 
correctly, because the cluster status is being read from the UI synchronized on 
the JobTracker and also a snapshot of the values is captured in a new 
ClusterStatus object when the UI reads it.

- Reservation tracking seems broken in many ways:
 -- removeTrackerReservations is being called in lostTaskTracker after 
reservations are cancelled. So information that needs to be removed is cleared 
already.
 -- It seems like ExpireLaunchingTasks can result in a tracker being globally 
blacklisted. But I don't see any add/removeTrackerReservations in this place.
 -- In processHeartbeat, there seem to be code paths where 
removeTrackerReservations is being called twice. For e.g. when the tracker is 
decided as lost.
 -- In general, it is very, very hard to verify the correctness of this patch 
in the current form as the logic is spread out in multiple code paths, and it 
is difficult to verify if all the code paths are being covered.


> Show total slot usage in cluster summary on jobtracker webui
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-1048
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1048
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.20.1
>            Reporter: Amar Kamat
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.22.0
>
>         Attachments: mapred-1048-v1.0.patch, mapred-1048-v1.1.patch, 
> patch-1048-0.20.txt, patch-1048-1.txt, patch-1048-2.txt, patch-1048-3.txt, 
> patch-1048.txt
>
>
> With High-Ram jobs coming into the picture, its important to also show the 
> slot usage in cluster summary since total-running-maps < 
> total-slots-occupied. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to