Hi Natallia,

Have a look at https://issues.apache.org/jira/browse/MAHOUT-9. I am hoping to have something to put up after ApacheCon Europe, at which point testing, help would be appreciated, so I am not sure it will make sense for a GSOC project or not. Perhaps you would be interested in looking into other algorithms? I think there are other things besides the "10" in our NIPS paper that are interesting, perhaps we can start brainstorming on them. Things of interest to me, although I don't know how well they can be implemented in M/R just yet are:

1. TextRank (see Mihalcea) although I doubt this is a full summer project
2. Brill POS tagger
3. Max entropy stuff

Of course, we don't have to just use M/R distributed approaches. We can look into algs that require more distributed capabilities.

Cheers,
Grant

On Mar 30, 2008, at 9:50 AM, Vitalisova, Natallia wrote:

Hi,



My name is Natallia Vitalisova and I've applied for Google SoC 2008 to implement the Naïve Bayes algorithm on Hadoop.



Either I will be accepted for SoC or not, I want to spend my time investigating this topic which I consider to be very interesting. But I will certainly need a mentor, or someone who will help me to answer some questions etc. Is there anyone willing to help?



Reply via email to