[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234968#comment-16234968
 ] 

Miklos Szegedi commented on MAPREDUCE-6994:
-------------------------------------------

Thank you, [~yufeigu].
bq. Method expandEnvironmentVariables() seems better to be in class Shell.
I intentionally want to keep this change separate from other projects in the 
repo
bq.
The current design requires users to understand what directory should be 
collected by providing multiple directories for "input". For example, users 
needs to input 
$HADOOP_HOME/share/hadoop/client/:$HADOOP_HOME/share/hadoop/common/, etc. The 
benefit of this solution is that it is flexible no matter how input directories 
organized. However, the directory hierarchy in $HADOOP_HOME is fixed especially 
in upstream output, how about providing an option to just input a $HADOOP_HOME 
and tool can figure out which sub-directories to get jars? Seems like to make 
method collectPackages recursively traverse the input directory would be enough 
since there is predefined whitelist.
So the main design point is to include whatever it is needed to run mapreduce 
jobs. This is by default the class path. The class path is the default input. 
Changing it to a root directory would add to the traversal time, and I think it 
is not necessary in this case. The white list and the black list filter the 
class path, to make sure only the necessary jars are included. Walking through 
the root has the risk of including jars with the same name, or jars that are 
not necessary, since they were not in the class path in the original scenario. 
Changing input to anything other than the class path is possible, however not 
advised.
I addressed all other comments.


> Uploader tool for Distributed Cache Deploy code changes
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6994
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>            Priority: Major
>         Attachments: MAPREDUCE-6994.000.patch
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to