[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340813#comment-15340813
 ] 

Sangjin Lee edited comment on MAPREDUCE-6719 at 6/21/16 1:03 AM:
-----------------------------------------------------------------

The latest patch looks good to me. The unit test that timed out is known to 
happen occasionally (can't locate an open JIRA for that one though). The 
deprecation warning is also an unfortunate result of {{DistributedCache}} being 
deprecated.

I do have a small quibble with the title of the JIRA. The title may give you an 
impression that with this JIRA one could use a wildcard in specifying the 
libjars *argument*. But that's not the case. What this JIRA does (in 
conjunction with YARN-4958) is to collapse the libjar directory in the 
configuration as well as the container launch context. But the current title 
can be confused with HADOOP-12747 ("support wildcard in libjars argument"). Can 
we modify the title to better describe this JIRA?


was (Author: sjlee0):
The latest patch looks good to me. The unit test that timed out is known to 
happen occasionally (can't locate an open JIRA for that one though). The 
deprecation warning is also an unfortunately result of {{DistributedCache}} 
being deprecated.

I do have a small quibble with the title of the JIRA. The title may give you an 
impression that with this JIRA one could use a wildcard in specifying the 
libjars *argument*. But that's not the case. What this JIRA does (in 
conjunction with YARN-4958) is to collapse the libjar directory in the 
configuration as well as the container launch context. But the current title 
can be confused with HADOOP-12747 ("support wildcard in libjars argument"). Can 
we modify the title to better describe this JIRA?

> -libjars should use wildcards to reduce the application footprint in the 
> state store
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6719
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6719
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 2.8.0
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: MAPREDUCE-6719.001.patch, MAPREDUCE-6719.002.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the ContainerLaunchContext's local resources 
> even though they're all uploaded to the same directory in HDFS. When using 
> tools like Crunch without an uber JAR or when trying to take advantage of the 
> shared cache, the number of libraries can be quite large. We've seen many 
> cases where we had to turn down the max number of applications to prevent ZK 
> from running out of heap because of the size of the state store entries.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the Job and DistributedCache 
> classes. Rather than listing all files independently, this JIRA proposes to 
> replace the complete list of libdir files with the wildcarded libdir 
> directory, e.g. "libdir/*". This behavior is the same as the current behavior 
> when using -libjars, but avoids explicitly listing every file.
> This capability will also be exposed by the 
> {{DistributedCache.addCacheFile()}} method.
> See YARN-4958 for the NM side of the implementation and additional discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to