Ian, it looks like you have some experience with Maven. I have heard a lot of good things about it from many people on many different projects, but no personal experience to speak of - can you vouch for it? What do you like about it? Is there anything that you don't like/want to get away from?
Ken On Jan 29, 2008 10:41 PM, Ian Holsman <[EMAIL PROTECTED]> wrote: > Hi Guys, > > my 2c's > > Grant Ingersoll wrote: > > But would people prefer getting the jars separately? I do think there > > is some common housekeeping code, but I also don't want to > > overemphasize it when it comes to developing an individual algorithm. > > In other words, if a bayes classifier and an SVM implementation could > > share a common framework, but it would end up being really confusing, > > versus them each being more or less cleanly separated and logical, I > > think I would favor the separated. By the same token, if they can > > work beautifully together, then, that would argue for more common code. > > > are we planning on making separate releases for the code? > does having these bundled together somehow impact the performance or > functionality of other algorithms? > would the combined size of he jar be less than 2-3m? > > if the answer to all of these is no, we should just have a single jar. > Size should not be an issue here, development/operational speed should be. > It is much easier to manage a single jar operationally IMHO. > > > > As for Hadoop and HBase, that is just two potential libraries. We are > > potentially talking 10+. Would you want to download a huge jar that > > contains everything when all you want is a single algorithm? Granted, > > that can be done from one source tree, but I wonder if that makes it > > harder. > yep. but in these days of maven and the like I have no idea how many > jars I'm actually downloading, it just does it. > > > > > But, I do take away that we probably should just start simple and not > > worry about a complex build just yet. I think it is safe to say that > > up through our first official release we can feel free to change > > things around if we have to. > > > > yep.. > > > > -Grant > > > > > > On Jan 29, 2008, at 9:04 PM, Mason Tang wrote: > > > >> +1 > >> > >> Not going to repeat the same arguments, but one other thing is that > >> almost all of the algorithms are going to (or at least should) share > >> some common housekeeping code, the main chunk of which will probably > >> be IO. Functionally, I don't think an individual algorithm is > >> significant enough to warrant its own project, and many of them might > >> wind up sharing common interfaces. > >> > >> ~ Mason > >> > >> Jeff Eastman wrote: > >>> +1 > >>> A single project facilitates refactoring and promotes consistency of > >>> design. If there's not enough code in Hadoop+Hbase to justify multiple > >>> projects it would be premature abstraction to organize Mahout that > way. > >>> Let's keep it simple... > >>> Jeff > >>> -----Original Message----- > >>> From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Tuesday, January > >>> 29, 2008 4:21 PM > >>> To: [email protected] > >>> Subject: Re: Thinking about Mahout layout, builds, etc. > >>> Initially, developers will be hitting bugs or bad design all over the > >>> place > >>> so they would favor one project. Also, with good package design, you > >>> get > >>> most of the benefits of multiple projects. > >>> So why not start simple and migrate to complicated later? > >>> On 1/29/08 3:15 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: > >>>> Thinking about these alternatives from an Eclipse user's point of > >>> view, > >>>> the original proposal would seem to encourage multiple projects (one > >>> per > >>>> algorithm + a common project) while the second would encourage a > >>> single > >>>> project containing multiple packages. Depending upon the amount of > >>> code > >>>> that would reside in each algorithm, one or the other might be > >>>> preferable. > >>>> > >>>> Would a given developer typically be working on the entire library > >>>> (single project favoring) or just on one or two algorithms (multiple > >>>> project favoring)? > >>>> > >>>> Jeff > >>>> > >>>> -----Original Message----- > >>>> From: Ted Dunning [mailto:[EMAIL PROTECTED] > >>>> Sent: Tuesday, January 29, 2008 2:43 PM > >>>> To: [email protected] > >>>> Subject: Re: Thinking about Mahout layout, builds, etc. > >>>> > >>>> > >>>> > >>>> I think that having multiple source roots is a pain. That is what > >>>> packages > >>>> are for. > >>>> > >>>> I would recommend instead: > >>>> > >>>> - at the top level, there should be trunk, tags, releases as is > >>> typical > >>>> in > >>>> an SVN based project. > >>>> > >>>> - below trunk and any tag or release there should be: > >>>> > >>>> docs > >>>> lib > >>>> src/org/apache/mahout > >>>> > >>>> Below the source directory, there should be packages common, > >>> algorithmA, > >>>> algorithmB and all tests should be locaated near the associated > >>> source. > >>>> If it is really desirable to separate tests from normal source (I > have > >>>> done > >>>> it both ways and find having the tests nearby beneficial), then there > >>>> can be > >>>> a parallel tree next to src called "test". > >>>> > >>>> The target of compilation should be a single jar file. > >>>> > >>>> > >>>> On 1/29/08 2:26 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> I am thinking a structure like the following would be useful for > >>>>> getting started: > >>>>> mahout/trunk/ > >>>>> docs > >>>>> common/ > >>>>> src/ > >>>>> main/ > >>>>> test/ > >>>>> docs/ > >>>>> lib/ > >>>>> algorithmA/ > >>>>> Similar to common, but for this algorithm > >>>>> algB > >>>>> ... > >>>>> ... > >>>>> > >>>>> Where algorithmA, B, etc. are the various libraries we intend to > >>>>> implement. We can hold off on creating them until we have some > code, > >>>>> but was thinking it would be good to have the general layout in > mind. > >>>>> > >>>>> Of course, this is expandable and changeable. What do others think? > >>>>> > >>>>> On a related note, one of the things we discussed pre-Apache, was > the > >>>>> general sense that we shouldn't feel the need to create an all > >>>>> encompassing framework. The basic gist of this being that any given > >>>>> library could be completely independent of the others (with maybe > the > >>>>> exception that they share a common library). My gut says this is > the > >>>>> way to get started, but that it may evolve over time once we have > >>> some > >>>>> running time together and can start to recognize synergies, such > that > >>>>> maybe by the time we get to 1.0 of Mahout there may be more common > >>>>> code than we originally thought. The "common" area above can serve > >>> as > >>>>> the area for utilities, classes, common Hadoop extensions, etc. that > >>>>> are shared between the various algorithms, but I would also say > let's > >>>>> not try to prematurely optimize across the algorithms just yet. > >>>>> > >>>>> Anyone else have any preference on this? > >>>>> > >>>>> -Grant > >>>>> > >> > >> -- > >> Mason Tang '10, Course 6-3 > >> Address: Burton-Conner 224A Email: [EMAIL PROTECTED] > >> 410 Memorial Dr. Phone: 508-414-5811 > >> Cambridge, MA 02139 WWW: www.geekbyday.com > > > > > > > > -- Ken Montanez | 510.681.5576
