Re: Thinking about Mahout layout, builds, etc.

Ted Dunning Tue, 29 Jan 2008 17:29:44 -0800

I can testify to the value of Hudson in the Hadoop project.  I have never
seen much value in Maven, especially for pure java projects.  Can't comment
on Ivy.



On 1/29/08 5:24 PM, "Ken Montanez" <[EMAIL PROTECTED]> wrote:

> Just curious if this technology stack has been looked at: Maven/Hudson/Ivy.
> Many groups have used these projects with great success and might address
> some of the initial questions that might come up during this initial phase
> of the project.
> 
> Ken
> 
> On Jan 29, 2008 4:38 PM, Ken Montanez <[EMAIL PROTECTED]> wrote:
> 
>> I agree. Also this is a good starting point. If we find that our initial
>> approach is not sufficient it will be easier to split from one source tree
>> to many than it will be to splice many to one; I am also trying to hint at
>> the fact that having different source tree's will tempt some to follow
>> different conventions than if everything is in one source tree (more context
>> to your work).
>> 
>> Thanks,
>> Ken
>> 
>> 
>> On Jan 29, 2008 4:22 PM, Vadim Zaliva <[EMAIL PROTECTED]> wrote:
>> 
>>> On Jan 29, 2008, at 16:13, Yousef Ourabi wrote:
>>> 
>>> I am am with Yoasef. I would prefer single-rooted source tree
>>> but would leave an option of building multiple jars. Actually
>>> we can build one jar per algorithm, plus special jumbo jar containing
>>> everything.
>>> 
>>> Sincerely,
>>> Vadim
>>> 
>>>> I'm with Ted on this one.
>>>> 
>>>> +1 for tags,trunk, branches and diff. packages.
>>>> 
>>>> Where I differ Is with the output. I can see some scenarios where it
>>>> makes
>>>> sense for ant dist-alg1, ant dist-alg2 -- this would reduce the
>>>> footprint in
>>>> applications that only need one vs the other.
>>>> 
>>>> Having multiple projects is just unnecessary over head.
>>>> 
>>>> -Yousef
>>>> 
>>>> On 1/29/08, Steve Rowe <[EMAIL PROTECTED]> wrote:
>>>>> 
>>>>> On 01/29/2008 at 6:44 PM, Lukas Vlcek wrote:
>>>>>> I would prefer to have an option not to work with whole library but
>>>>>> select only specific algorithms and optionally their particular
>>>>>> modifications.
>>>>> 
>>>>> +1
>>>>> 
>>>>>>> Thinking about these alternatives from an Eclipse user's point of
>>>>> view,
>>>>>>> the original proposal would seem to encourage multiple projects
>>>>>>> (one
>>>>>>> per algorithm + a common project) while the second would
>>>>>>> encourage a
>>>>>>> single project containing multiple packages. Depending upon the
>>>>>>> amount
>>>>>>> of code that would reside in each algorithm, one or the other
>>>>>>> might be
>>>>>>> preferable.
>>>>>>> 
>>>>>>> Would a given developer typically be working on the entire library
>>>>>>> (single project favoring) or just on one or two algorithms
>>>>>>> (multiple
>>>>>>> project favoring)?
>>>>>>> 
>>>>>>> Jeff
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Ted Dunning [mailto:[EMAIL PROTECTED]
>>>>>>> Sent: Tuesday, January 29, 2008 2:43 PM
>>>>>>> To: [email protected]
>>>>>>> Subject: Re: Thinking about Mahout layout, builds, etc.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I think that having multiple source roots is a pain.  That is what
>>>>>>> packages
>>>>>>> are for.
>>>>>>> 
>>>>>>> I would recommend instead:
>>>>>>> 
>>>>>>> - at the top level, there should be trunk, tags, releases as is
>>>>> typical
>>>>>>> in an SVN based project.
>>>>>>> 
>>>>>>> - below trunk and any tag or release there should be:
>>>>>>> 
>>>>>>>  docs
>>>>>>>  lib
>>>>>>>  src/org/apache/mahout
>>>>>>> 
>>>>>>> Below the source directory, there should be packages common,
>>>>>>> algorithmA, algorithmB and all tests should be locaated near the
>>>>>>> associated source.
>>>>>>> 
>>>>>>> If it is really desirable to separate tests from normal source (I
>>>>>>> have
>>>>>>> done it both ways and find having the tests nearby beneficial),
>>>>>>> then
>>>>>>> there can be a parallel tree next to src called "test".
>>>>>>> 
>>>>>>> The target of compilation should be a single jar file.
>>>>>>> 
>>>>>>> 
>>>>>>> On 1/29/08 2:26 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
>>>>>>> 
>>>>>>>> I am thinking a structure like the following would be useful for
>>>>>>>> getting started:
>>>>>>>> mahout/trunk/
>>>>>>>>   docs
>>>>>>>>   common/
>>>>>>>> src/
>>>>>>>>            main/
>>>>>>>>            test/
>>>>>>>>         docs/
>>>>>>>>         lib/
>>>>>>>>   algorithmA/
>>>>>>>>        Similar to common, but for this algorithm algB ...
>>>>>>>>    ...
>>>>>>>> 
>>>>>>>> Where algorithmA, B, etc. are the various libraries we intend to
>>>>>>>> implement.  We can hold off on creating them until we have some
>>>>> code,
>>>>>>>> but was thinking it would be good to have the general layout in
>>>>> mind.
>>>>>>>> 
>>>>>>>> Of course, this is expandable and changeable.  What do others
>>>>>>>> think?
>>>>>>>> 
>>>>>>>> On a related note, one of the things we discussed pre-Apache, was
>>>>> the
>>>>>>>> general sense that we shouldn't feel the need to create an all
>>>>>>>> encompassing framework.  The basic gist of this being that any
>>>>>>>> given
>>>>>>>> library could be completely independent of the others (with maybe
>>>>> the
>>>>>>>> exception that they share a common library).  My gut says this is
>>>>> the
>>>>>>>> way to get started, but that it may evolve over time once we have
>>>>> some
>>>>>>>> running time together and can start to recognize synergies, such
>>>>> that
>>>>>>>> maybe by the time we get to 1.0 of Mahout there may be more common
>>>>>>>> code than we originally thought.  The "common" area above can
>>>>>>>> serve
>>>>> as
>>>>>>>> the area for utilities, classes, common Hadoop extensions, etc.
>>>>>>>> that
>>>>>>>> are shared between the various algorithms, but I would also say
>>>>> let's
>>>>>>>> not try to prematurely optimize across the algorithms just yet.
>>>>>>>> 
>>>>>>>> Anyone else have any preference on this?
>>>>>>>> 
>>>>>>>> -Grant
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 
>

Re: Thinking about Mahout layout, builds, etc.

Reply via email to