[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720108#comment-13720108
 ] 

Jason Lowe commented on MAPREDUCE-4421:
---------------------------------------

bq. Would it make sense to allow directories as well in 
mapreduce.application.framework.path? That would make it easier to swap out a 
jar without rebuilding the tarball.

The problem with directories is that officially they are unsupported in the 
distributed cache.  Besides that, from a practical standpoint, it's much more 
difficult for a nodemanager to verify it doesn't need to localize anything when 
the item being localized is an arbitrary directory tree.  That's a lot of HDFS 
stats to do vs. just one for the archive case.

bq. Does the distributed cache actually cache things in between jobs?

Yes, it does if it can.  It depends upon the visibility of the item being 
localized.  If it's PUBLIC the resource will be cached and reused among all 
users and all jobs.  If PRIVATE the resource will be cached only per-user but 
reused between jobs for that user.  If APPLICATION then it will only be 
localized for a single job.  See LocalResourceVisibility and 
ClientDistributedCacheManager.determineCacheVisibilities for some details.

The javadoc is correct in that even for the APPLICATION case a resource will 
only be localized once even though multiple containers may run on the same 
node, so it's more efficient than just letting the tasks hit HDFS directly for 
the resource when multiple tasks run on the same node and the resource is 
needed by all tasks.
                
> Remove dependency on deployed MR jars
> -------------------------------------
>
>                 Key: MAPREDUCE-4421
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Jason Lowe
>         Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch
>
>
> Currently MR AM depends on MR jars being deployed on all nodes via implicit 
> dependency on YARN_APPLICATION_CLASSPATH. 
> We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, 
> probably, just rely on adding a shaded MR jar along with job.jar to the 
> dist-cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to