[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

Allen Wittenauer (JIRA) Wed, 02 Mar 2016 10:43:02 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176220#comment-15176220
 ]


Allen Wittenauer commented on HADOOP-12857:
-------------------------------------------

I have some sample code working.  It was very enlightening and I know what to 
do now.  If we really do want to keep one directory, here's my current plan of 
attack:

* Truly optional components (s3, azure, openstack, kafka, etc), will have a 
shellprofile built that users can enable by doing the necessary incantations.  
I'm currently thinking I might be able to add content to hadoop-env.sh at build 
time to actually turn these things on via a single env-var setting or one per 
feature. No promises.  (Yes, I'm currently looking for my "Black Hat of Bash 
Wizardry" to make this happen.) Worst case, it'll be a "copy and rename to 
HADOOP_CONF_DIR".

* With some help from [~raviprak] to make me see the forest for the trees, I 
can now build shell parse-able dependency lists at build time.  I have two ways 
I can process this:  I can either store these lists in the hadoop-dist target 
directory or in the target directory of the actually tools+using a 
well-known-name+find to build the necessary shell magic at build time.  I'm 
leaning towards the latter since that will allow mvn clean to work in 
hadoop-dist in an expected way, since there won't be a hidden dependency on 
hadoop-tools having been run before the mvn package.

* distch, distcp, archive-logs, etc, are extremely problematic. Using shell 
profiles for these WILL NOT WORK since they a) aren't really optional and b) 
removing them from the command line tools won't really help anyone.  Currently 
these commands load all of HADOOP_TOOLS_PATH which is awful. I want to add to 
libexec/ a tools directory that stores helper functions for tools jars that are 
required for the various subcommands.  It will use similar but different code 
from the optional components.  It will key off a different filename for the 
dependency list and there will need to be a contract between the helper 
function names and the dependency file name.  (This sounds worse than what it 
is.) 

I *wish* there was a way to dynamically add subcommands to hadoop, mapred, etc, 
but the code just isn't quite there yet.  We can do usage now, but not actually 
execution.

One big question: How should this work proceed?
# Single patch
# Multiple patches with a strict commit dependency order
# Separate branch followed by a branch merge

Given this work will likely be all or nothing I'm not a fan of multiple patches.

> Rework hadoop-tools-dist
> ------------------------
>
>                 Key: HADOOP-12857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 3.0.0
>            Reporter: Allen Wittenauer
>            Assignee: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

Reply via email to