[
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541107#comment-14541107
]
Karthik Kambatla commented on MAPREDUCE-5951:
---------------------------------------------
Thanks Chris.
Comments on the latest patch:
# DistributedCache changes are all spurious. Omit it from the diff.
# Remove Job#addArchiveToSharedCacheAndClasspath altogether?
# JobImpl#cleanupSharedCacheResources should be always called when the job
finishes, irrespective of success.
# JobResourceUploader
## stopSharedCache should set scClient to null
## We don't need a separate boolean to check if SCM is available. We should be
able to just use (sClient != null). If we run into any issues talking to SCM,
we should just abort and call stopSharedCache to avoid using it for the rest of
the dependencies.
## uploadFiles seems to be trying to handle the case where shared-cache goes
down after we set all the files that need to be uploaded. We should leave this
check to the NM. The SCM might come back up by the time the NM tries uploading
files. Or, it could be available at this point and go down later.
{code}
// if scm fails in the middle, we will set shared cache upload policies
// for all resources
// to be false. The resources that are shared successfully via
// SharedCacheClient.use will
// continued to be shared.
if (scClient != null && !isScmAvailable()) {
{code}
## uploadFiles checks if shared-cache is available before setting the upload
policies. I think we should set the upload policies irrespective of SCM's
availability when shared-cache is not disabled. Also, we should modify the
below code to have the second if inside the first if.
{code}
if (isSharedCacheFilesEnabled()) {
newPath = useSharedCache(tmp, conf);
}
// need to inform NM to upload the file to shared cache.
if (newPath == null && isSharedCacheFilesEnabled()) {
filesSCUploadPolicies[indexOfFilesSCUploadPolicies] = true;
}
{code}
## useSharedCache should check for (scClient != null)
## uploadFiles has a lot of duplication. Can we file a follow-up JIRA to
simplify it?
## Also, due to the try-finally in uploadFiles, it is kind of hard to see what
lines have been changed. Do you think it makes any sense to wrap this in
another method call, so the indentations don't show spurious changes? I am
assuming most of this code will be touched by the follow-up JIRA.
## useSharedCache javadoc issue
# MRApps - we are using "*" instead of "job.jar" which should work, but I
wonder if that will be an incompatible behavior change. [~jlowe], [~vinodkv] -
what do you think?
# YarnRunner - indentation issue on line 360
> Add support for the YARN Shared Cache
> -------------------------------------
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-trunk-v1.patch,
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v2.patch,
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch,
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch,
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch,
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify
> which set of resources they would like to cache (i.e. jobjar, libjars,
> archives, files).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)