[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

Jason Lowe (JIRA) Fri, 28 Apr 2017 05:39:24 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988744#comment-15988744
 ]


Jason Lowe commented on MAPREDUCE-5951:
---------------------------------------

I don't think it really matters whether the jar resource uploaded by the client 
is public or private.  In both cases the HDFS path to which the client posts 
the resource will be removed when the job completes.  If any subsequent jobs 
come along and figure out via the SCM that they can avoid uploading their own, 
redundant copy of the same resource then they will receive a resource path 
within the SCM area which is a _different_ path than the one used by the first 
job.  That means the resource is going to get downloaded to the node again 
because it's in a different location than the first job's resource.

Even if the first job's client uploads the resource to a public directory, no 
other job is going to ask for that resource under the same path.  It will be 
uploaded to a public staging directory which is specific to that app and whose 
path exists only as long as the app.  The problem with having jobs try to share 
resources automatically just from the job client is knowing when the resource 
can be removed, otherwise we could yank it just as another app tries to 
localize it or never clean it up.  That's why the SCM does the necessary ref 
counting to know what's being used and when resources can be freed safely.  If 
we want to avoid the double-download of the resource then the job client will 
need to upload the resource to the SCM directly and then submit the job _after_ 
it has received the public resource path from the SCM.


> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

Reply via email to