[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542904#comment-14542904
 ] 

Chris Trezzo commented on MAPREDUCE-5951:
-----------------------------------------

1. [~kasha] I have thought about removing the release api more and also 
discussed with [~sjlee0]. I think it makes sense from a code simplicity 
standpoint to remove the release api. This will eliminate the need for multiple 
arrays and keeping track of the resources you use on the client side. If we 
feel that it is needed later on, we can always add it back in. The major 
consequence of not releasing is that the SCM store will have more resource 
references to keep track of during the cleaner period time (currently 
defaulting to 1 day). For the InMemorySCMStore, this means that there will be 
more SharedCacheResourceReference objects in-memory.

Rough hand-wavy calculation for heapsize over a 24 hour period on a large 
cluster:
* 42k jobs per day x 600 resources per job = 25.2 million resource references
* A resource reference is made up of an ApplicationId and a ShortUserName.
** Let's say the ApplicationId is two longs, so 16 bytes, and the shortUserName 
is 10 characters, so 20 bytes.
** Let's also multiply this number by 3 to account for Object overhead. So (16 
+ 20) * 3 = 108 bytes for a single resource reference.
* 25.2 million * 108 bytes = 2.7 GB of total heap space

2.7 GB of extra memory does not strike me of being too crazy. We can also trade 
off RM load for memory size and run the cleaner at a higher frequency. Thoughts 
from others? If that sounds reasonable, I will file a YARN jira to make the 
change.

2. [~jlowe] I will add a comment that explains why we are now using '*' instead 
of MRJobConfig.JOB_JAR.

> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, 
> MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, 
> MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch, 
> MAPREDUCE-5951-trunk-v8.patch, MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to