[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

Sangjin Lee (JIRA) Tue, 10 Jan 2017 21:14:40 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817206#comment-15817206
 ]


Sangjin Lee commented on MAPREDUCE-5951:
----------------------------------------

I went over the patch with Chris in some detail today, and am posting the 
review comments here for the record.

(MRJobConfig.java)
- {{mapreduce.job.jobjar.visibility}} and 
{{mapreduce.job.jobjar.sharedcache.uploadpolicy}} are computed values that are 
not user-facing; in that case we should not even define the defaults so that 
there is no confusion that these values are computed

(JobResourceUploader.java)
- l.159-160: I understand lines 161-165 are there to support programmatic use 
cases of the distributed cache that come in outside of the job submitter code 
path. Can we make the comments clearer so that the intent of this comes 
through? We could also annotate them in {{MRJobConfig}}.
- l.171-172: it might be slightly better to use {{LinkedHashMap}}. That way, 
we'd have a predictable iteration order (the order in which they are specified 
in the user values).
- l.219: To be fair, this is a bug we need to fix, right? Then can we file a 
JIRA and add the JIRA id here?
- l.237-240: I would not worry about handling previous values here. Having 
duplicate paths is not really supported and the worst case scenario here is to 
reset this upload policy with the same value.
- l.260-263: I think we can improve on this, and reconcile the shared cache 
with the wildcard feature. We could see if any resource is uploaded to the 
staging directory, and if so, still preserve the wildcard entry. We also need 
to consider the case where the shared cache is disabled but the wildcard is 
enabled.
- l.291-294: same comment as above
- l.348-351: same comment as above
- l.388: Since we're dealing with a local filesystem URI based on l.381-382, 
the authority check is not meaningful. We should remove this check.
- On a larger note, the path/URI handling between the job jar, libjars, files, 
and archives is not very consistent, which is an existing behavior. We need to 
see if they need to get the same consistent treatment for this to work.

(Job.java)
- l.1446: I realized later that passing an empty map has an effect of nulling 
out the config value; perhaps we could make that more explicit in the javadoc 
and/or comments/code?
- l.1449-1463: nit: it might be slightly easier to read by using a simple 
string concatenation with "+" (JVM internally uses the {{StringBuilder}})
- l.1490: here also it might be better to use {{LinkedHashMap}}


> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, 
> MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, 
> MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch, 
> MAPREDUCE-5951-trunk-v8.patch, MAPREDUCE-5951-trunk-v9.patch, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

Reply via email to