[
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Trezzo updated MAPREDUCE-5951:
------------------------------------
Attachment: MAPREDUCE-5951-trunk-v7.patch
[[email protected]] Thanks again for the comments!
Attached is v7 of the patch. This version is rebased and addresses your
comments above. I removed the DistributedCache changes, addressed comments
about Job, JobID, JobImpl. With respect to comment 5.2, the patch is not hard
coding MR job submission to always use SharedCache. See if the new patch
improves clarity around that and let me know if you have more questions. There
are two changes that will happen even if the shared cache is disabled:
1. The SharedCacheConfig class will be used to parse configuration in
JobResourceUploader. If the shared cache config parameters do not exist, then
it is a no-op.
2. The MR classpath around job jars has be changed slightly (that is the reason
for the MRApps and TestMRApps changes), but should present no behavioral
changes to the user. This is to handle the case where the job jar used by a job
comes from the shared cache and it is named anything other than job.jar. Note
that the current code assumes that whatever is localized in the job.jar
directory is a single file named job.jar (i.e. job.jar/job.jar in the
classpath). In the case where the job.jar is named something else, it will not
get put on the classpath. This change simply puts everything in the job.jar
directory (currently only the job jar) on the classpath (i.e. job.jar/*).
With respect to the comment about fool-proof config: did you have anything
specific in mind? Currently the config should only recognize disabled, enabled,
jobjar, libjars, files, archives. I could split each into a separate boolean
config parameter if that seems more safe? Let me know. I was trying to come up
with a concise single parameter for all the modes, but maybe splitting them up
into separate boolean parameters is better. I can also see the
JOBJAR_VISIBILITY parameter being slightly confusing and will think if there is
a better way to do that. Again, let me know if you have suggestions.
Also, let me know if you want me to split this patch further. I could see
splitting it into the following (although the splits won't be fully functional):
1. JobResourceUploader changes. The diff is still a little wonky with the code
restructure from adding shared cache checks.
2. TaskImpl changes.
3. JobImpl changes.
4. job.jar classpath changes
> Add support for the YARN Shared Cache
> -------------------------------------
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Attachments: MAPREDUCE-5951-trunk-v1.patch,
> MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
> MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch,
> MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch
>
>
> Implement the necessary changes so that the MapReduce application can
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify
> which set of resources they would like to cache (i.e. jobjar, libjars,
> archives, files).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)