[
https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234968#comment-16234968
]
Miklos Szegedi commented on MAPREDUCE-6994:
-------------------------------------------
Thank you, [~yufeigu].
bq. Method expandEnvironmentVariables() seems better to be in class Shell.
I intentionally want to keep this change separate from other projects in the
repo
bq.
The current design requires users to understand what directory should be
collected by providing multiple directories for "input". For example, users
needs to input
$HADOOP_HOME/share/hadoop/client/:$HADOOP_HOME/share/hadoop/common/, etc. The
benefit of this solution is that it is flexible no matter how input directories
organized. However, the directory hierarchy in $HADOOP_HOME is fixed especially
in upstream output, how about providing an option to just input a $HADOOP_HOME
and tool can figure out which sub-directories to get jars? Seems like to make
method collectPackages recursively traverse the input directory would be enough
since there is predefined whitelist.
So the main design point is to include whatever it is needed to run mapreduce
jobs. This is by default the class path. The class path is the default input.
Changing it to a root directory would add to the traversal time, and I think it
is not necessary in this case. The white list and the black list filter the
class path, to make sure only the necessary jars are included. Walking through
the root has the risk of including jars with the same name, or jars that are
not necessary, since they were not in the class path in the original scenario.
Changing input to anything other than the class path is possible, however not
advised.
I addressed all other comments.
> Uploader tool for Distributed Cache Deploy code changes
> -------------------------------------------------------
>
> Key: MAPREDUCE-6994
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Miklos Szegedi
> Assignee: Miklos Szegedi
> Priority: Major
> Attachments: MAPREDUCE-6994.000.patch
>
>
> The proposal is to create a tool that collects all available jars in the
> Hadoop classpath and adds them to a single tarball file. It then uploads the
> resulting archive to an HDFS directory. This saves the cluster administrator
> from having to set this up manually for Distributed Cache Deploy.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]