Alright, I think we've discussed enough on this and everybody seems to agree about a top level hadoop-tools module.
Time to get into the action. I've filed HADOOP-7624. Amareshwari we can track the rest of the implementation related details and questions for your specific answers there. Thanks everyone for putting in your thoughts here. +Vinod On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jrottingh...@ebay.com>wrote: > If hadoop-tools will be built as part of hadoop-common, then none of these > tools should be allowed to have a dependency on hdfs or mapreduce. > Conversely is also true, when tools do have any such dependency, they > cannot be bult as part of hadoop-common. > We cannot have circular dependencies like that. > > That is probably obvious, but I'm just saying... > > Joep > ________________________________________ > From: Amareshwari Sri Ramadasu [amar...@yahoo-inc.com] > Sent: Wednesday, September 07, 2011 9:33 PM > To: mapreduce-...@hadoop.apache.org > Cc: common-dev@hadoop.apache.org > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > It is good to have hadoop-tools module separately. But as I asked before we > need to answer some questions here. I'm trying to answer them myself. > Comments are welcome. > > > > 1. Should the patches for tools be created against Hadoop Common? > Here, I meant should Hadoop common mailing list be used Or should we have a > separate mailing list for Tools? I agree with Vinod here, that we can tie > it Hadoop-common jira/mailing lists. > > > > 2. What will happen to the tools test automation? Will it run as part > of Hadoop Common tests? > Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop > common if use Hadoop common mailing list for this. > Also, I propose every patch build of HDFS and MAPREDUCE should also run > tools tests to make sure nothing is broken. That would ease the maintenance > of hadoop-tools module. I presume tools test should not take much time (some > thing like not more than 30 minutes). > > > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > > taken care in Mavenization? > I'm not sure about this whether Mavenization can take care of it. > > Thanks > Amareshwari > > On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jrottingh...@ebay.com> wrote: > > Does a separate hadoop-tools module imply that there will be a separate > Jenkins build as well? > > Thanks, > > Joep > ________________________________________ > From: Alejandro Abdelnur [t...@cloudera.com] > Sent: Wednesday, September 07, 2011 11:35 AM > To: mapreduce-...@hadoop.apache.org > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > Makes sense > > On Wed, Sep 7, 2011 at 11:32 AM, <milind.bhandar...@emc.com> wrote: > > > +1 for separate hadoop-tools module. However, if a tool is broken at > > release time, and no one comes forward to fix it, it should be removed. > > (i.e. Unlike contrib modules, where build and test failures were > > tolerated.) > > > > - milind > > > > On 9/7/11 11:27 AM, "Mahadev Konar" <maha...@hortonworks.com> wrote: > > > > >I like the idea of having tools as a seperate module and I dont think > > >that it will be a dumping ground unless we choose to make one of it. > > > > > >+1 for hadoop tools module under trunk. > > > > > >thanks > > >mahadev > > > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <t...@cloudera.com> > > >wrote: > > >> Agreed, we should not have a dumping ground. IMO, what it would go > into > > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > > >>FsShell as > > >> well) are effectively hadoop CLI utilities. Having them in a separate > > >>module > > >> rather in than in the core module (common, hdfs, mapreduce) does not > > >>mean > > >> that they are secondary things, just modularization. Also it will help > > >>to > > >> get those tools to use public interfaces of the core module, and when > we > > >> finally have a clean hadoop-client layer, those tools should only > > >>depend on > > >> that. > > >> > > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it > > >>does > > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the > > >> same/different tools > > >> > > >> +1 for hadoop-tools/ (not binding) > > >> > > >> Thanks. > > >> > > >> > > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <eric...@gmail.com> wrote: > > >> > > >>> Mapreduce and HDFS are distinct function of Hadoop. They are loosely > > >>> coupled. If we have tools aggregator module, it will not have as > > >>> clear distinct function as other Hadoop modules. Hence, it is > > >>> possible for a tool to be depend on both HDFS and map reduce. If > > >>> something broke in tools module, it is unclear which subproject's > > >>> responsibility to maintain tools function. Therefore, it is safer to > > >>> send tools to incubator or apache extra rather than deposit the > > >>> utility tools in tools subcategory. There are many short lived > > >>> projects that attempts to associate themselves with Hadoop but not > > >>> being maintained. It would be better to spin off those utility > > >>> projects than use Hadoop as a dumping ground. > > >>> > > >>> The previous discussion for removing contrib, most people were in > > >>> favor of doing so, and only a few contrib owners were reluctant to > > >>> remove contrib. Fewer people has participated in restore > > >>> functionality of broken contrib projects. History speaks for itself. > > >>> -1 (non-binding) for hadoop-tools. > > >>> > > >>> regards, > > >>> Eric > > >>> > > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur < > t...@cloudera.com> > > >>> wrote: > > >>> > Eric, > > >>> > > > >>> > Personally I'm fine either way. > > >>> > > > >>> > Still, I fail to see why a generic/categorized tools > increase/reduce > > >>>the > > >>> > risk of dead code and how they make more-difficult/easier the > > >>> > package&deployment. > > >>> > > > >>> > Would you please explain this? > > >>> > > > >>> > Thanks. > > >>> > > > >>> > Alejandro > > >>> > > > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric...@gmail.com> > wrote: > > >>> > > > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal. > We > > >>> don't > > >>> >> want to repeat history for contrib again with hadoop-tools. > Having > > >>>a > > >>> >> generic module like hadoop-tools increases the risk of accumulate > > >>>dead > > >>> code. > > >>> >> It would be better to categorize the hdfs or mapreduce specific > > >>>tools > > >>> in > > >>> >> their respected subcategories. It is also easier to manage from > > >>> >> package/deployment prospective. > > >>> >> > > >>> >> regards, > > >>> >> Eric > > >>> >> > > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: > > >>> >> > > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer < > a...@apache.org> > > >>> wrote: > > >>> >> >> > > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: > > >>> >> >>> We still need to answer Amareshwari's question (2) she asked > > >>>some > > >>> time > > >>> >> back > > >>> >> >>> about the automated code compilation and test execution of the > > >>>tools > > >>> >> module. > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >>>>> My #1 question is if tools is basically contrib reborn. If > > >>>not, > > >>> what > > >>> >> >>>> makes > > >>> >> >>>>> it different? > > >>> >> >> > > >>> >> >> > > >>> >> >> I'm still waiting for this answer as well. > > >>> >> >> > > >>> >> >> Until such, I would be pretty much against a tools > module. > > >>> >> Changing the name of the dumping ground doesn't make it any less > > >>>of a > > >>> >> dumping ground. > > >>> >> > > > >>> >> > IMO if the tools module only gets stuff like distcp that's > > >>>maintained > > >>> >> > then it's not contrib, if it contains all the stuff from the > > >>>current > > >>> >> > MR contrib then tools is just a re-labeling of contrib. Given > that > > >>> >> > this proposal only covers moving distcp to tools it doesn't > sound > > >>>like > > >>> >> > contrib to me. > > >>> >> > > > >>> >> > Thanks, > > >>> >> > Eli > > >>> >> > > >>> >> > > >>> > > > >>> > > >> > > > > > > > > >