[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908604#action_12908604
 ] 

Rajesh Balamohan commented on MAPREDUCE-1904:
---------------------------------------------

Thanks for the review comments Arun. 

1. For #1, I would post the profiler output of which methods are expensive in 
getLocalPathToRead().

2. For #2, the code path for LocalDirAllocator.confChanged() need not be called 
in this context of TaskTracker. 

Reason: In this context, TaskTracker is trying to check for any config changes 
related to  "mapred.local.dir" using LocalDirAllocator. Once its read, this 
parameter does not change over TaskTracker's lifetime. Hence, it is not 
mandatory to do this check for every invocation. Corner case: When tasktracker 
goes down and new configs are reloaded, the LRUCache would also be repopulated. 
 



> Reducing locking contention in TaskTracker.MapOutputServlet's 
> LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, 
> profiler output after applying the patch.jpg, TaskTracker- yourkit profiler 
> output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads 
> block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
> file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would 
> be available per tasktracker httpserver.  Given the jobid & mapid, 
> LocalDirAllocator retrieves index file path and temporary map output file 
> path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily 
> (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
> LRUCache can be varied based on the environment and I observed a throughput 
> improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to