Most algorithms will fall naturally into families and will be quite small.
What you say really has merit on a large project, but even all of hadoop is barely that large. On 1/29/08 3:44 PM, "Lukas Vlcek" <[EMAIL PROTECTED]> wrote: > Hi, > > Not only each algorithm can be seen as a separated project but also there > can be many ways how one algorithm can be implemented as well. I would not > be surprised by the fact that one algorithm can have many implementations > each suitable for different type of input data (dense vs sparse) or having > various accuracy to speed ratio for example. > > I would prefer to have an option not to work with whole library but select > only specific algorithms and optionally their particular modifications. > > Just my 2 cents. > > Lukas > > On Jan 30, 2008 12:15 AM, Jeff Eastman <[EMAIL PROTECTED]> wrote: > >> Thinking about these alternatives from an Eclipse user's point of view, >> the original proposal would seem to encourage multiple projects (one per >> algorithm + a common project) while the second would encourage a single >> project containing multiple packages. Depending upon the amount of code >> that would reside in each algorithm, one or the other might be >> preferable. >> >> Would a given developer typically be working on the entire library >> (single project favoring) or just on one or two algorithms (multiple >> project favoring)? >> >> Jeff >> >> -----Original Message----- >> From: Ted Dunning [mailto:[EMAIL PROTECTED] >> Sent: Tuesday, January 29, 2008 2:43 PM >> To: [email protected] >> Subject: Re: Thinking about Mahout layout, builds, etc. >> >> >> >> I think that having multiple source roots is a pain. That is what >> packages >> are for. >> >> I would recommend instead: >> >> - at the top level, there should be trunk, tags, releases as is typical >> in >> an SVN based project. >> >> - below trunk and any tag or release there should be: >> >> docs >> lib >> src/org/apache/mahout >> >> Below the source directory, there should be packages common, algorithmA, >> algorithmB and all tests should be locaated near the associated source. >> >> If it is really desirable to separate tests from normal source (I have >> done >> it both ways and find having the tests nearby beneficial), then there >> can be >> a parallel tree next to src called "test". >> >> The target of compilation should be a single jar file. >> >> >> On 1/29/08 2:26 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote: >> >>> I am thinking a structure like the following would be useful for >>> getting started: >>> mahout/trunk/ >>> docs >>> common/ >>> src/ >>> main/ >>> test/ >>> docs/ >>> lib/ >>> algorithmA/ >>> Similar to common, but for this algorithm >>> algB >>> ... >>> ... >>> >>> Where algorithmA, B, etc. are the various libraries we intend to >>> implement. We can hold off on creating them until we have some code, >>> but was thinking it would be good to have the general layout in mind. >>> >>> Of course, this is expandable and changeable. What do others think? >>> >>> On a related note, one of the things we discussed pre-Apache, was the >>> general sense that we shouldn't feel the need to create an all >>> encompassing framework. The basic gist of this being that any given >>> library could be completely independent of the others (with maybe the >>> exception that they share a common library). My gut says this is the >>> way to get started, but that it may evolve over time once we have some >>> running time together and can start to recognize synergies, such that >>> maybe by the time we get to 1.0 of Mahout there may be more common >>> code than we originally thought. The "common" area above can serve as >>> the area for utilities, classes, common Hadoop extensions, etc. that >>> are shared between the various algorithms, but I would also say let's >>> not try to prematurely optimize across the algorithms just yet. >>> >>> Anyone else have any preference on this? >>> >>> -Grant >>> >> >> >
