Re: Thinking about Mahout layout, builds, etc.

Vadim Zaliva Tue, 29 Jan 2008 16:23:12 -0800

On Jan 29, 2008, at 16:13, Yousef Ourabi wrote:

I am am with Yoasef. I would prefer single-rooted source tree
but would leave an option of building multiple jars. Actually
we can build one jar per algorithm, plus special jumbo jar containing
everything.


Sincerely,
Vadim

I'm with Ted on this one.

+1 for tags,trunk, branches and diff. packages.

Where I differ Is with the output. I can see some scenarios where itmakessense for ant dist-alg1, ant dist-alg2 -- this would reduce thefootprint in

applications that only need one vs the other.

Having multiple projects is just unnecessary over head.

-Yousef

On 1/29/08, Steve Rowe <[EMAIL PROTECTED]> wrote:


On 01/29/2008 at 6:44 PM, Lukas Vlcek wrote:

I would prefer to have an option not to work with whole library but
select only specific algorithms and optionally their particular
modifications.

+1

Thinking about these alternatives from an Eclipse user's point of

view,

the original proposal would seem to encourage multiple projects(oneper algorithm + a common project) while the second wouldencourage asingle project containing multiple packages. Depending upon theamountof code that would reside in each algorithm, one or the othermight be
preferable.

Would a given developer typically be working on the entire library
(single project favoring) or just on one or two algorithms(multiple
project favoring)?

Jeff

-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 29, 2008 2:43 PM
To: [email protected]
Subject: Re: Thinking about Mahout layout, builds, etc.



I think that having multiple source roots is a pain.  That is what
packages
are for.

I would recommend instead:

- at the top level, there should be trunk, tags, releases as is

typical

in an SVN based project.

- below trunk and any tag or release there should be:

 docs
 lib
 src/org/apache/mahout

Below the source directory, there should be packages common,
algorithmA, algorithmB and all tests should be locaated near the
associated source.

If it is really desirable to separate tests from normal source (Ihavedone it both ways and find having the tests nearby beneficial),then

there can be a parallel tree next to src called "test".

The target of compilation should be a single jar file.


On 1/29/08 2:26 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:

I am thinking a structure like the following would be useful for
getting started:
mahout/trunk/
  docs
  common/
src/
           main/
           test/
        docs/
        lib/
  algorithmA/
       Similar to common, but for this algorithm algB ...
   ...

Where algorithmA, B, etc. are the various libraries we intend to
implement.  We can hold off on creating them until we have some

code,

but was thinking it would be good to have the general layout in

mind.

Of course, this is expandable and changeable. What do othersthink?
On a related note, one of the things we discussed pre-Apache, was

the

general sense that we shouldn't feel the need to create an all
encompassing framework. The basic gist of this being that anygiven
library could be completely independent of the others (with maybe

the

exception that they share a common library).  My gut says this is

the

way to get started, but that it may evolve over time once we have

some

running time together and can start to recognize synergies, such

that

maybe by the time we get to 1.0 of Mahout there may be more common
code than we originally thought. The "common" area above canserve

as

the area for utilities, classes, common Hadoop extensions, etc.that
are shared between the various algorithms, but I would also say

let's

not try to prematurely optimize across the algorithms just yet.

Anyone else have any preference on this?

-Grant

Re: Thinking about Mahout layout, builds, etc.

Reply via email to