Re: Thinking about Mahout layout, builds, etc.

Ken Montanez Tue, 29 Jan 2008 17:24:50 -0800

Just curious if this technology stack has been looked at: Maven/Hudson/Ivy.
Many groups have used these projects with great success and might address
some of the initial questions that might come up during this initial phase
of the project.


Ken

On Jan 29, 2008 4:38 PM, Ken Montanez <[EMAIL PROTECTED]> wrote:

> I agree. Also this is a good starting point. If we find that our initial
> approach is not sufficient it will be easier to split from one source tree
> to many than it will be to splice many to one; I am also trying to hint at
> the fact that having different source tree's will tempt some to follow
> different conventions than if everything is in one source tree (more context
> to your work).
>
> Thanks,
> Ken
>
>
> On Jan 29, 2008 4:22 PM, Vadim Zaliva <[EMAIL PROTECTED]> wrote:
>
> > On Jan 29, 2008, at 16:13, Yousef Ourabi wrote:
> >
> > I am am with Yoasef. I would prefer single-rooted source tree
> > but would leave an option of building multiple jars. Actually
> > we can build one jar per algorithm, plus special jumbo jar containing
> > everything.
> >
> > Sincerely,
> > Vadim
> >
> > > I'm with Ted on this one.
> > >
> > > +1 for tags,trunk, branches and diff. packages.
> > >
> > > Where I differ Is with the output. I can see some scenarios where it
> > > makes
> > > sense for ant dist-alg1, ant dist-alg2 -- this would reduce the
> > > footprint in
> > > applications that only need one vs the other.
> > >
> > > Having multiple projects is just unnecessary over head.
> > >
> > > -Yousef
> > >
> > > On 1/29/08, Steve Rowe <[EMAIL PROTECTED]> wrote:
> > >>
> > >> On 01/29/2008 at 6:44 PM, Lukas Vlcek wrote:
> > >>> I would prefer to have an option not to work with whole library but
> > >>> select only specific algorithms and optionally their particular
> > >>> modifications.
> > >>
> > >> +1
> > >>
> > >>>> Thinking about these alternatives from an Eclipse user's point of
> > >> view,
> > >>>> the original proposal would seem to encourage multiple projects
> > >>>> (one
> > >>>> per algorithm + a common project) while the second would
> > >>>> encourage a
> > >>>> single project containing multiple packages. Depending upon the
> > >>>> amount
> > >>>> of code that would reside in each algorithm, one or the other
> > >>>> might be
> > >>>> preferable.
> > >>>>
> > >>>> Would a given developer typically be working on the entire library
> > >>>> (single project favoring) or just on one or two algorithms
> > >>>> (multiple
> > >>>> project favoring)?
> > >>>>
> > >>>> Jeff
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> > >>>> Sent: Tuesday, January 29, 2008 2:43 PM
> > >>>> To: [email protected]
> > >>>> Subject: Re: Thinking about Mahout layout, builds, etc.
> > >>>>
> > >>>>
> > >>>>
> > >>>> I think that having multiple source roots is a pain.  That is what
> > >>>> packages
> > >>>> are for.
> > >>>>
> > >>>> I would recommend instead:
> > >>>>
> > >>>> - at the top level, there should be trunk, tags, releases as is
> > >> typical
> > >>>> in an SVN based project.
> > >>>>
> > >>>> - below trunk and any tag or release there should be:
> > >>>>
> > >>>>  docs
> > >>>>  lib
> > >>>>  src/org/apache/mahout
> > >>>>
> > >>>> Below the source directory, there should be packages common,
> > >>>> algorithmA, algorithmB and all tests should be locaated near the
> > >>>> associated source.
> > >>>>
> > >>>> If it is really desirable to separate tests from normal source (I
> > >>>> have
> > >>>> done it both ways and find having the tests nearby beneficial),
> > >>>> then
> > >>>> there can be a parallel tree next to src called "test".
> > >>>>
> > >>>> The target of compilation should be a single jar file.
> > >>>>
> > >>>>
> > >>>> On 1/29/08 2:26 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
> > >>>>
> > >>>>> I am thinking a structure like the following would be useful for
> > >>>>> getting started:
> > >>>>> mahout/trunk/
> > >>>>>   docs
> > >>>>>   common/
> > >>>>> src/
> > >>>>>            main/
> > >>>>>            test/
> > >>>>>         docs/
> > >>>>>         lib/
> > >>>>>   algorithmA/
> > >>>>>        Similar to common, but for this algorithm algB ...
> > >>>>>    ...
> > >>>>>
> > >>>>> Where algorithmA, B, etc. are the various libraries we intend to
> > >>>>> implement.  We can hold off on creating them until we have some
> > >> code,
> > >>>>> but was thinking it would be good to have the general layout in
> > >> mind.
> > >>>>>
> > >>>>> Of course, this is expandable and changeable.  What do others
> > >>>>> think?
> > >>>>>
> > >>>>> On a related note, one of the things we discussed pre-Apache, was
> > >> the
> > >>>>> general sense that we shouldn't feel the need to create an all
> > >>>>> encompassing framework.  The basic gist of this being that any
> > >>>>> given
> > >>>>> library could be completely independent of the others (with maybe
> > >> the
> > >>>>> exception that they share a common library).  My gut says this is
> > >> the
> > >>>>> way to get started, but that it may evolve over time once we have
> > >> some
> > >>>>> running time together and can start to recognize synergies, such
> > >> that
> > >>>>> maybe by the time we get to 1.0 of Mahout there may be more common
> > >>>>> code than we originally thought.  The "common" area above can
> > >>>>> serve
> > >> as
> > >>>>> the area for utilities, classes, common Hadoop extensions, etc.
> > >>>>> that
> > >>>>> are shared between the various algorithms, but I would also say
> > >> let's
> > >>>>> not try to prematurely optimize across the algorithms just yet.
> > >>>>>
> > >>>>> Anyone else have any preference on this?
> > >>>>>
> > >>>>> -Grant
> > >>>
> > >>
> > >>
> >
> >
>


-- 
Ken Montanez | 510.681.5576

Re: Thinking about Mahout layout, builds, etc.

Reply via email to