[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4421:
----------------------------------

    Attachment: MAPREDUCE-4421-3.patch

Thanks for taking another look, Hitesh.

bq. Regarding addMRFrameworkToDistributedCache() - one minor question: the code 
allows for a non-qualified URI. Should we enforce provision of a 
fully-qualified path always?

I thought it would be easier to let it be qualified by the cluster's configured 
defaults if not already fully qualified.  Otherwise users/admins would have to 
not only say "hdfs:/path/to/archive" but "hdfs://namenode:port/path/to/archive" 
and if/when the name or port of the filesystem changes then it breaks.  If we 
let it be qualified by cluster defaults then admins can update the default 
filesystem in core-site and the simpler forms continue to work unmodified.

bq. Minor nit: I believe there should be nothing in the implementation that 
requires HDFS as the storage for the MR tarball?

Good point.  I updated the documentation to refer to a distributed cache deploy 
rather than an HDFS deploy.  However I did call out in the docs the performance 
ramifications of not using the cluster's default filesystem and a 
publicly-readable path for the archive.  Otherwise the job submitter could end 
up re-uploading and the nodes re-localizing the framework for each job or each 
user.  It will work, but it will be slower than necessary.

> Remove dependency on deployed MR jars
> -------------------------------------
>
>                 Key: MAPREDUCE-4421
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Jason Lowe
>         Attachments: MAPREDUCE-4421-2.patch, MAPREDUCE-4421-3.patch, 
> MAPREDUCE-4421.patch, MAPREDUCE-4421.patch
>
>
> Currently MR AM depends on MR jars being deployed on all nodes via implicit 
> dependency on YARN_APPLICATION_CLASSPATH. 
> We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, 
> probably, just rely on adding a shaded MR jar along with job.jar to the 
> dist-cache.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to