[
https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Payne resolved MAPREDUCE-2011.
-----------------------------------
Resolution: Won't Fix
[~knoguchi], here are [~jlowe]'s comments from an offline discussion:
I think the distributed cache already behaves the way you desire, at least in
YARN. When a resource request arrives at the nodemanager, it tries to lookup
the local resource info based on that request. If it finds it (i.e.: a hit in
the cache) then it just increments the refcount of the resource – I don't see
any attempt to stat HDFS to verify it's still there in HDFS. The only time I
see the timestamp of the request compared with HDFS is when it tries to
download the resource from HDFS.
> Reduce number of getFileStatus call made from every
> task(TaskDistributedCache) setup
> ------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: distributed-cache
> Reporter: Koji Noguchi
>
> On our cluster, we had jobs with 20 dist cache and very short-lived tasks
> resulting in 500 map tasks launched per second resulting in 10,000
> getFileStatus calls to the namenode. Namenode can handle this but asking to
> see if we can reduce this somehow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)