Hi Natallia,
Have a look at https://issues.apache.org/jira/browse/MAHOUT-9. I am
hoping to have something to put up after ApacheCon Europe, at which
point testing, help would be appreciated, so I am not sure it will
make sense for a GSOC project or not. Perhaps you would be interested
in looking into other algorithms? I think there are other things
besides the "10" in our NIPS paper that are interesting, perhaps we
can start brainstorming on them. Things of interest to me, although I
don't know how well they can be implemented in M/R just yet are:
1. TextRank (see Mihalcea) although I doubt this is a full summer
project
2. Brill POS tagger
3. Max entropy stuff
Of course, we don't have to just use M/R distributed approaches. We
can look into algs that require more distributed capabilities.
Cheers,
Grant
On Mar 30, 2008, at 9:50 AM, Vitalisova, Natallia wrote:
Hi,
My name is Natallia Vitalisova and I've applied for Google SoC 2008
to implement the Naïve Bayes algorithm on Hadoop.
Either I will be accepted for SoC or not, I want to spend my time
investigating this topic which I consider to be very interesting.
But I will certainly need a mentor, or someone who will help me to
answer some questions etc. Is there anyone willing to help?