Hi,

it's great to see a ML project started at Apache!

I have a little bit of background in applying ML to various
NLP tasks such as text classification, POS tagging and
entity detection.  I'm not a ML algorithms guy, though.  I'm
wondering if these kinds of tasks are among those you guys
had in mind when you started this project?

If yes, then I have a follow-up question.  In these NLP tasks,
choosing and extracting the right kinds of features is just
as important as the actual learning algorithm you employ.  Any
thoughts on that?  Would these kinds of feature selection
tasks be in scope for Mahout, or would you consider that a
a separate problem to be dealt with elsewhere?

Anyway, I'll certainly hang out here and see where this is
going.  If things are happening around text/NLP, I may be able
to contribute.

I'd also like to mention that over in the UIMA incubator
project, we have a sandbox project going that does Hidden Markov
Model-based POS tagging, with promising results.  Not sure if
there can be any synergies there.  I didn't see HMMs mentioned
in the map/reduce paper and understand this stuff too little
to know if they fit the Statistical Query model.

--Thilo

Reply via email to