Hi Grant,
Good points raised. My views below
> This brings up an ... find patches?
My vote is in favour having people check-out code directly instead of
finding patches.
To keep things clean, we can have separate directory structures catering
to the needs of having
to check-in non-parallel or incomplete versions of the code and make
their compilation an
optional target in the build. I see 2 advantages of doing this early in
the project
1. Non-parallel (but functional) versions can serve as good
enough tools to demonstrate
the algorithm in action and can actually simplify the process
of understanding the
algorithm itself. This is beneficial to people new to
machine-learning/map-reduce.
2. The non-parallel versions can provide good insight into how to
actually map-reduce the
underlying algorithm and can form as a basis of discussion
between developers.
> Also, how do people want ... build the community, etc.
Sounds like a good idea. I agree such examples would serve as a good
enough demonstration of the usage
of our library and its capabilities and would definitely benefit the
community. To start with I would
imagine that as soon as we have our first algorithm in place we may pick
a publicly available data-set
to build a simple example. This would also enable us to measure and
compare the performance of algorithms
on different kinds of data-sets later on.
-Ankur
-----Original Message-----
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 14, 2008 7:22 PM
To: [email protected]
Subject: Community development was Re: [jira] Updated: (MAHOUT-4) Simple
prototype for Expectation Maximization (EM)
Hi Ankur,
Thanks for the contribution!
This brings up an interesting community point. Would people rather we
commit patches earlier, even though they aren't parallelized yet or
completely ready to go? I know once the project is more mature, I
wouldn't favor this, but I am wondering if it will help grease the
wheels, so to speak, in the early stages if people can just go checkout
the code and work on it, versus having to go find patches?
Also, how do people want to handle creating examples? For instance, it
probably would be useful to have some simple examples using publicly
available datasets for our algorithms? I don't know that they belong in
the core library (although, maybe the do) but definitely could be
shipped as examples/tutorials/contrib. For instance, I can see these
kinds of things going along way to getting us into students hands who
are learning ML, etc. which should also help build the community, etc.
-Grant
On Feb 13, 2008, at 7:55 AM, Ankur (JIRA) wrote:
>
> [
> https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira
> .plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Ankur updated MAHOUT-4:
> -----------------------
>
> Attachment: PLSI_EM.patch
>
> Here is the prototype implementation of of Probabilistic Latent
> Semantic Indexing (PLSI) that uses Expectation Maximization. Please
> refer to javadoc comments for explanation.
>
> Feel free to experiment with the code and have fun :-)
>
>> Simple prototype for Expectation Maximization (EM)
>> --------------------------------------------------
>>
>> Key: MAHOUT-4
>> URL: https://issues.apache.org/jira/browse/MAHOUT-4
>> Project: Mahout
>> Issue Type: New Feature
>> Reporter: Ankur
>> Attachments: PLSI_EM.patch
>>
>>
>> Create a simple prototype implementing Expectation Maximization - EM
>> that demonstrates the algorithm functionality given a set of (user,
>> click-url) data.
>> The prototype should be functionally complete and should serve as a
>> basis for the Map-Reduce version of the EM algorithm.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>