[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated MAPREDUCE-5951:
------------------------------------
    Attachment: MAPREDUCE-5951-trunk-v7.patch

[[email protected]] Thanks again for the comments!

Attached is v7 of the patch. This version is rebased and addresses your 
comments above. I removed the DistributedCache changes, addressed comments 
about Job, JobID, JobImpl. With respect to comment 5.2, the patch is not hard 
coding MR job submission to always use SharedCache. See if the new patch 
improves clarity around that and let me know if you have more questions. There 
are two changes that will happen even if the shared cache is disabled:
1. The SharedCacheConfig class will be used to parse configuration in 
JobResourceUploader. If the shared cache config parameters do not exist, then 
it is a no-op.
2. The MR classpath around job jars has be changed slightly (that is the reason 
for the MRApps and TestMRApps changes), but should present no behavioral 
changes to the user. This is to handle the case where the job jar used by a job 
comes from the shared cache and it is named anything other than job.jar. Note 
that the current code assumes that whatever is localized in the job.jar 
directory is a single file named job.jar (i.e. job.jar/job.jar in the 
classpath). In the case where the job.jar is named something else, it will not 
get put on the classpath. This change simply puts everything in the job.jar 
directory (currently only the job jar) on the classpath (i.e. job.jar/*).

With respect to the comment about fool-proof config: did you have anything 
specific in mind? Currently the config should only recognize disabled, enabled, 
jobjar, libjars, files, archives. I could split each into a separate boolean 
config parameter if that seems more safe? Let me know. I was trying to come up 
with a concise single parameter for all the modes, but maybe splitting them up 
into separate boolean parameters is better. I can also see the 
JOBJAR_VISIBILITY parameter being slightly confusing and will think if there is 
a better way to do that. Again, let me know if you have suggestions.

Also, let me know if you want me to split this patch further. I could see 
splitting it into the following (although the splits won't be fully functional):
1. JobResourceUploader changes. The diff is still a little wonky with the code 
restructure from adding shared cache checks.
2. TaskImpl changes.
3. JobImpl changes.
4. job.jar classpath changes

> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, 
> MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, 
> MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, 
> MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to