[
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319375#comment-14319375
]
Karthik Kambatla commented on MAPREDUCE-5951:
---------------------------------------------
Sorry for the delay in getting to this. Getting a continuous chunk of time to
look at this somewhat large patch was hard.
Here are my first round of comments - a combination of high-level and detailed
comments. Let us see if we can get some of this in through other JIRAs first,
to allow for a more thorough review.
# DistributedCache changes aren’t central to what this JIRA is trying to
address. Could we leave them out and address in another JIRA?
## This has nothing to do with this patch, but it would be nice to make the
code around setting CLASSPATH_FILES a little more readable. Could we define
another String prefix to hold “” or classpath, based on whether classpath is
null.
# Job
## The new APIs should all be @Unstable
## Let us make the javadoc for the new APIs a little more formal - we don’t
need to mention SCMClientProtocol.use, or that the APIs are intended for user
use. Even for the return value, I would go with something like “If shared cache
is enabled and the resource added successfully, return true. Otherwise, return
false.”
## How about renaming the methods to addFileToSharedCache,
addArchiveToSharedCache, addFileToSharedCacheAndClasspath?
## Make both new methods private static instead of static private.
# JobID changes might not be required. Use ConverterUtils#toApplicationId?
# JobImpl
## cleanupSharedCacheResources - nit: I would check for (checksums == null ||
checksums.length == 0) and return to save on indentations. 80 chars is already
too small.
## cleanupSharedCacheUploadPolicies - javadoc should use block comments. Well,
may be a nit.
# JobSubmitter
## Can we do the code moving from JobSumitter to FileUploader (may be, we need
a more descriptive name) to another JIRA and look at that first if needed.
Otherwise, it is hard to review the changes.
## May be, I am misreading the patch. Is this patch hardcoding MR job
submission to always use SharedCache? If yes, we should definitely avoid that.
# mapred-default.xml: We need a little more fool-proof config. The way the
patch currently is, a typo will lead to unexpected behavior without any
warnings.
> Add support for the YARN Shared Cache
> -------------------------------------
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Attachments: MAPREDUCE-5951-trunk-v1.patch,
> MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
> MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch,
> MAPREDUCE-5951-trunk-v6.patch
>
>
> Implement the necessary changes so that the MapReduce application can
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify
> which set of resources they would like to cache (i.e. jobjar, libjars,
> archives, files).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)