[
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988744#comment-15988744
]
Jason Lowe commented on MAPREDUCE-5951:
---------------------------------------
I don't think it really matters whether the jar resource uploaded by the client
is public or private. In both cases the HDFS path to which the client posts
the resource will be removed when the job completes. If any subsequent jobs
come along and figure out via the SCM that they can avoid uploading their own,
redundant copy of the same resource then they will receive a resource path
within the SCM area which is a _different_ path than the one used by the first
job. That means the resource is going to get downloaded to the node again
because it's in a different location than the first job's resource.
Even if the first job's client uploads the resource to a public directory, no
other job is going to ask for that resource under the same path. It will be
uploaded to a public staging directory which is specific to that app and whose
path exists only as long as the app. The problem with having jobs try to share
resources automatically just from the job client is knowing when the resource
can be removed, otherwise we could yank it just as another app tries to
localize it or never clean it up. That's why the SCM does the necessary ref
counting to know what's being used and when resources can be freed safely. If
we want to avoid the double-download of the resource then the job client will
need to upload the resource to the SCM directly and then submit the job _after_
it has received the public resource path from the SCM.
> Add support for the YARN Shared Cache
> -------------------------------------
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-Overview.001.pdf,
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch,
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch,
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch,
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch,
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch,
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch,
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch,
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch,
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch,
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify
> which set of resources they would like to cache (i.e. jobjar, libjars,
> archives, files).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]