[
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907334#action_12907334
]
Arun C Murthy commented on MAPREDUCE-1904:
------------------------------------------
Couple of concerns:
# I'd like to understand what part of LocalDirAllocator.getLocalPathToRead is
expensive... it's fine to add a cache, but it's better to do it _after_ we
understand why we really need it.
# This patch results in the code path skipping the sanity checks in
LocalDirAllocator.confChanged which is called by
LocalDirAllocator.getLocalPathToRead. That is a concern. Again, this might be
the expensive part of LocalDirAllocator.getLocalPathToRead, but we need to
ensure that.
Don't get me wrong, the focus of this jira is very useful - we just need to fix
it the 'right' way.
> Reducing locking contention in TaskTracker.MapOutputServlet's
> LocalDirAllocator
> -------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1904
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tasktracker
> Affects Versions: 0.20.1
> Reporter: Rajesh Balamohan
> Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch,
> profiler output after applying the patch.jpg, TaskTracker- yourkit profiler
> output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads
> block on LocalDirAllocator.getLocalPathToRead() in order to get the index
> file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext, only one instance would
> be available per tasktracker httpserver. Given the jobid & mapid,
> LocalDirAllocator retrieves index file path and temporary map output file
> path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily
> (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the
> LRUCache can be varied based on the environment and I observed a throughput
> improvement in the order of 4-7% with the introduction of LRUCache.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.