Most algorithms will fall naturally into families and will be quite small.

What you say really has merit on a large project, but even all of hadoop is
barely that large.


On 1/29/08 3:44 PM, "Lukas Vlcek" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Not only each algorithm can be seen as a separated project but also there
> can be many ways how one algorithm can be implemented as well. I would not
> be surprised by the fact that one algorithm can have many implementations
> each suitable for different type of input data (dense vs sparse) or having
> various accuracy to speed ratio for example.
> 
> I would prefer to have an option not to work with whole library but select
> only specific algorithms and optionally their particular modifications.
> 
> Just my 2 cents.
> 
> Lukas
> 
> On Jan 30, 2008 12:15 AM, Jeff Eastman <[EMAIL PROTECTED]> wrote:
> 
>> Thinking about these alternatives from an Eclipse user's point of view,
>> the original proposal would seem to encourage multiple projects (one per
>> algorithm + a common project) while the second would encourage a single
>> project containing multiple packages. Depending upon the amount of code
>> that would reside in each algorithm, one or the other might be
>> preferable.
>> 
>> Would a given developer typically be working on the entire library
>> (single project favoring) or just on one or two algorithms (multiple
>> project favoring)?
>> 
>> Jeff
>> 
>> -----Original Message-----
>> From: Ted Dunning [mailto:[EMAIL PROTECTED]
>> Sent: Tuesday, January 29, 2008 2:43 PM
>> To: [email protected]
>> Subject: Re: Thinking about Mahout layout, builds, etc.
>> 
>> 
>> 
>> I think that having multiple source roots is a pain.  That is what
>> packages
>> are for.
>> 
>> I would recommend instead:
>> 
>> - at the top level, there should be trunk, tags, releases as is typical
>> in
>> an SVN based project.
>> 
>> - below trunk and any tag or release there should be:
>> 
>>   docs
>>   lib
>>   src/org/apache/mahout
>> 
>> Below the source directory, there should be packages common, algorithmA,
>> algorithmB and all tests should be locaated near the associated source.
>> 
>> If it is really desirable to separate tests from normal source (I have
>> done
>> it both ways and find having the tests nearby beneficial), then there
>> can be
>> a parallel tree next to src called "test".
>> 
>> The target of compilation should be a single jar file.
>> 
>> 
>> On 1/29/08 2:26 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
>> 
>>> I am thinking a structure like the following would be useful for
>>> getting started:
>>> mahout/trunk/
>>>    docs
>>>    common/
>>> src/
>>>             main/
>>>             test/
>>>          docs/
>>>          lib/
>>>    algorithmA/
>>>         Similar to common, but for this algorithm
>>>    algB
>>>         ...
>>>     ...
>>> 
>>> Where algorithmA, B, etc. are the various libraries we intend to
>>> implement.  We can hold off on creating them until we have some code,
>>> but was thinking it would be good to have the general layout in mind.
>>> 
>>> Of course, this is expandable and changeable.  What do others think?
>>> 
>>> On a related note, one of the things we discussed pre-Apache, was the
>>> general sense that we shouldn't feel the need to create an all
>>> encompassing framework.  The basic gist of this being that any given
>>> library could be completely independent of the others (with maybe the
>>> exception that they share a common library).  My gut says this is the
>>> way to get started, but that it may evolve over time once we have some
>>> running time together and can start to recognize synergies, such that
>>> maybe by the time we get to 1.0 of Mahout there may be more common
>>> code than we originally thought.  The "common" area above can serve as
>>> the area for utilities, classes, common Hadoop extensions, etc. that
>>> are shared between the various algorithms, but I would also say let's
>>> not try to prematurely optimize across the algorithms just yet.
>>> 
>>> Anyone else have any preference on this?
>>> 
>>> -Grant
>>> 
>> 
>> 
> 

Reply via email to