[
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817206#comment-15817206
]
Sangjin Lee commented on MAPREDUCE-5951:
----------------------------------------
I went over the patch with Chris in some detail today, and am posting the
review comments here for the record.
(MRJobConfig.java)
- {{mapreduce.job.jobjar.visibility}} and
{{mapreduce.job.jobjar.sharedcache.uploadpolicy}} are computed values that are
not user-facing; in that case we should not even define the defaults so that
there is no confusion that these values are computed
(JobResourceUploader.java)
- l.159-160: I understand lines 161-165 are there to support programmatic use
cases of the distributed cache that come in outside of the job submitter code
path. Can we make the comments clearer so that the intent of this comes
through? We could also annotate them in {{MRJobConfig}}.
- l.171-172: it might be slightly better to use {{LinkedHashMap}}. That way,
we'd have a predictable iteration order (the order in which they are specified
in the user values).
- l.219: To be fair, this is a bug we need to fix, right? Then can we file a
JIRA and add the JIRA id here?
- l.237-240: I would not worry about handling previous values here. Having
duplicate paths is not really supported and the worst case scenario here is to
reset this upload policy with the same value.
- l.260-263: I think we can improve on this, and reconcile the shared cache
with the wildcard feature. We could see if any resource is uploaded to the
staging directory, and if so, still preserve the wildcard entry. We also need
to consider the case where the shared cache is disabled but the wildcard is
enabled.
- l.291-294: same comment as above
- l.348-351: same comment as above
- l.388: Since we're dealing with a local filesystem URI based on l.381-382,
the authority check is not meaningful. We should remove this check.
- On a larger note, the path/URI handling between the job jar, libjars, files,
and archives is not very consistent, which is an existing behavior. We need to
see if they need to get the same consistent treatment for this to work.
(Job.java)
- l.1446: I realized later that passing an empty map has an effect of nulling
out the config value; perhaps we could make that more explicit in the javadoc
and/or comments/code?
- l.1449-1463: nit: it might be slightly easier to read by using a simple
string concatenation with "+" (JVM internally uses the {{StringBuilder}})
- l.1490: here also it might be better to use {{LinkedHashMap}}
> Add support for the YARN Shared Cache
> -------------------------------------
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-trunk-v1.patch,
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch,
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch,
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch,
> MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
> MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch,
> MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch,
> MAPREDUCE-5951-trunk-v8.patch, MAPREDUCE-5951-trunk-v9.patch,
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch,
> MAPREDUCE-5951-trunk.018.patch
>
>
> Implement the necessary changes so that the MapReduce application can
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify
> which set of resources they would like to cache (i.e. jobjar, libjars,
> archives, files).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]