[
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542904#comment-14542904
]
Chris Trezzo commented on MAPREDUCE-5951:
-----------------------------------------
1. [~kasha] I have thought about removing the release api more and also
discussed with [~sjlee0]. I think it makes sense from a code simplicity
standpoint to remove the release api. This will eliminate the need for multiple
arrays and keeping track of the resources you use on the client side. If we
feel that it is needed later on, we can always add it back in. The major
consequence of not releasing is that the SCM store will have more resource
references to keep track of during the cleaner period time (currently
defaulting to 1 day). For the InMemorySCMStore, this means that there will be
more SharedCacheResourceReference objects in-memory.
Rough hand-wavy calculation for heapsize over a 24 hour period on a large
cluster:
* 42k jobs per day x 600 resources per job = 25.2 million resource references
* A resource reference is made up of an ApplicationId and a ShortUserName.
** Let's say the ApplicationId is two longs, so 16 bytes, and the shortUserName
is 10 characters, so 20 bytes.
** Let's also multiply this number by 3 to account for Object overhead. So (16
+ 20) * 3 = 108 bytes for a single resource reference.
* 25.2 million * 108 bytes = 2.7 GB of total heap space
2.7 GB of extra memory does not strike me of being too crazy. We can also trade
off RM load for memory size and run the cleaner at a higher frequency. Thoughts
from others? If that sounds reasonable, I will file a YARN jira to make the
change.
2. [~jlowe] I will add a comment that explains why we are now using '*' instead
of MRJobConfig.JOB_JAR.
> Add support for the YARN Shared Cache
> -------------------------------------
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-trunk-v1.patch,
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch,
> MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
> MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch,
> MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch,
> MAPREDUCE-5951-trunk-v8.patch, MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify
> which set of resources they would like to cache (i.e. jobjar, libjars,
> archives, files).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)