[jira] Commented: (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

Koji Noguchi (JIRA) Mon, 16 Aug 2010 09:36:40 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898981#action_12898981
 ]


Koji Noguchi commented on MAPREDUCE-2011:
-----------------------------------------

MAPREDUCE-1901 has a detail proposal of how to handle distributed cache better 
for those loaded by jobclient (-libjars).
As part of it, it mentions 

{quote}
The TaskTracker, on being requested to run a task requiring CAR resource md5_F 
checks whether md5_F is localized.

    * If md5_F is already localized - then nothing more needs to be done. the 
localized version is used by the Task
    * If md5_F is not localized - then its fetched from the CAR repository
{quote}

This Jira is basically almost asking the same except for asking to use existing 
mtime instead of a new md5_F proposed.
Just to reduce the mtime/getFileStatus calls, mtime check is enough and can 
keep the change small.



> Reduce number of getFileStatus call made from every 
> task(TaskDistributedCache) setup
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>            Reporter: Koji Noguchi
>
> On our cluster, we had jobs with 20 dist cache and very short-lived tasks 
> resulting in 500 map tasks launched per second resulting in  10,000 
> getFileStatus calls to the namenode.  Namenode can handle this but asking to 
> see if we can reduce this somehow.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

Reply via email to