RE: Thinking about Mahout layout, builds, etc.

Steve Rowe Tue, 29 Jan 2008 15:48:07 -0800

On 01/29/2008 at 6:44 PM, Lukas Vlcek wrote:
> I would prefer to have an option not to work with whole library but
> select only specific algorithms and optionally their particular
> modifications.


+1

> > Thinking about these alternatives from an Eclipse user's point of view,
> > the original proposal would seem to encourage multiple projects (one
> > per algorithm + a common project) while the second would encourage a
> > single project containing multiple packages. Depending upon the amount
> > of code that would reside in each algorithm, one or the other might be
> > preferable.
> > 
> > Would a given developer typically be working on the entire library
> > (single project favoring) or just on one or two algorithms (multiple
> > project favoring)?
> > 
> > Jeff
> > 
> > -----Original Message-----
> > From: Ted Dunning [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, January 29, 2008 2:43 PM
> > To: [email protected]
> > Subject: Re: Thinking about Mahout layout, builds, etc.
> > 
> > 
> > 
> > I think that having multiple source roots is a pain.  That is what
> > packages
> > are for.
> > 
> > I would recommend instead:
> > 
> > - at the top level, there should be trunk, tags, releases as is typical
> > in an SVN based project.
> > 
> > - below trunk and any tag or release there should be:
> > 
> >   docs
> >   lib
> >   src/org/apache/mahout
> > 
> > Below the source directory, there should be packages common,
> > algorithmA, algorithmB and all tests should be locaated near the
> > associated source.
> > 
> > If it is really desirable to separate tests from normal source (I have
> > done it both ways and find having the tests nearby beneficial), then
> > there can be a parallel tree next to src called "test".
> > 
> > The target of compilation should be a single jar file.
> > 
> > 
> > On 1/29/08 2:26 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
> > 
> > > I am thinking a structure like the following would be useful for
> > > getting started:
> > > mahout/trunk/
> > >    docs
> > >    common/
> > > src/
> > >             main/
> > >             test/
> > >          docs/
> > >          lib/
> > >    algorithmA/
> > >         Similar to common, but for this algorithm algB ...
> > >     ...
> > > 
> > > Where algorithmA, B, etc. are the various libraries we intend to
> > > implement.  We can hold off on creating them until we have some code,
> > > but was thinking it would be good to have the general layout in mind.
> > > 
> > > Of course, this is expandable and changeable.  What do others think?
> > > 
> > > On a related note, one of the things we discussed pre-Apache, was the
> > > general sense that we shouldn't feel the need to create an all
> > > encompassing framework.  The basic gist of this being that any given
> > > library could be completely independent of the others (with maybe the
> > > exception that they share a common library).  My gut says this is the
> > > way to get started, but that it may evolve over time once we have some
> > > running time together and can start to recognize synergies, such that
> > > maybe by the time we get to 1.0 of Mahout there may be more common
> > > code than we originally thought.  The "common" area above can serve as
> > > the area for utilities, classes, common Hadoop extensions, etc. that
> > > are shared between the various algorithms, but I would also say let's
> > > not try to prematurely optimize across the algorithms just yet.
> > > 
> > > Anyone else have any preference on this?
> > > 
> > > -Grant
>

RE: Thinking about Mahout layout, builds, etc.

Reply via email to