[
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176220#comment-15176220
]
Allen Wittenauer commented on HADOOP-12857:
-------------------------------------------
I have some sample code working. It was very enlightening and I know what to
do now. If we really do want to keep one directory, here's my current plan of
attack:
* Truly optional components (s3, azure, openstack, kafka, etc), will have a
shellprofile built that users can enable by doing the necessary incantations.
I'm currently thinking I might be able to add content to hadoop-env.sh at build
time to actually turn these things on via a single env-var setting or one per
feature. No promises. (Yes, I'm currently looking for my "Black Hat of Bash
Wizardry" to make this happen.) Worst case, it'll be a "copy and rename to
HADOOP_CONF_DIR".
* With some help from [~raviprak] to make me see the forest for the trees, I
can now build shell parse-able dependency lists at build time. I have two ways
I can process this: I can either store these lists in the hadoop-dist target
directory or in the target directory of the actually tools+using a
well-known-name+find to build the necessary shell magic at build time. I'm
leaning towards the latter since that will allow mvn clean to work in
hadoop-dist in an expected way, since there won't be a hidden dependency on
hadoop-tools having been run before the mvn package.
* distch, distcp, archive-logs, etc, are extremely problematic. Using shell
profiles for these WILL NOT WORK since they a) aren't really optional and b)
removing them from the command line tools won't really help anyone. Currently
these commands load all of HADOOP_TOOLS_PATH which is awful. I want to add to
libexec/ a tools directory that stores helper functions for tools jars that are
required for the various subcommands. It will use similar but different code
from the optional components. It will key off a different filename for the
dependency list and there will need to be a contract between the helper
function names and the dependency file name. (This sounds worse than what it
is.)
I *wish* there was a way to dynamically add subcommands to hadoop, mapred, etc,
but the code just isn't quite there yet. We can do usage now, but not actually
execution.
One big question: How should this work proceed?
# Single patch
# Multiple patches with a strict commit dependency order
# Separate branch followed by a branch merge
Given this work will likely be all or nothing I'm not a fan of multiple patches.
> Rework hadoop-tools-dist
> ------------------------
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
> Issue Type: Improvement
> Components: build
> Affects Versions: 3.0.0
> Reporter: Allen Wittenauer
> Assignee: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a
> single directory that gets sucked in is starting to become a big burden as
> the number of tools grows. Let's rework this to be smarter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)