[
https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898981#action_12898981
]
Koji Noguchi commented on MAPREDUCE-2011:
-----------------------------------------
MAPREDUCE-1901 has a detail proposal of how to handle distributed cache better
for those loaded by jobclient (-libjars).
As part of it, it mentions
{quote}
The TaskTracker, on being requested to run a task requiring CAR resource md5_F
checks whether md5_F is localized.
* If md5_F is already localized - then nothing more needs to be done. the
localized version is used by the Task
* If md5_F is not localized - then its fetched from the CAR repository
{quote}
This Jira is basically almost asking the same except for asking to use existing
mtime instead of a new md5_F proposed.
Just to reduce the mtime/getFileStatus calls, mtime check is enough and can
keep the change small.
> Reduce number of getFileStatus call made from every
> task(TaskDistributedCache) setup
> ------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: distributed-cache
> Reporter: Koji Noguchi
>
> On our cluster, we had jobs with 20 dist cache and very short-lived tasks
> resulting in 500 map tasks launched per second resulting in 10,000
> getFileStatus calls to the namenode. Namenode can handle this but asking to
> see if we can reduce this somehow.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.