Hi, it's great to see a ML project started at Apache!
I have a little bit of background in applying ML to various NLP tasks such as text classification, POS tagging and entity detection. I'm not a ML algorithms guy, though. I'm wondering if these kinds of tasks are among those you guys had in mind when you started this project? If yes, then I have a follow-up question. In these NLP tasks, choosing and extracting the right kinds of features is just as important as the actual learning algorithm you employ. Any thoughts on that? Would these kinds of feature selection tasks be in scope for Mahout, or would you consider that a a separate problem to be dealt with elsewhere? Anyway, I'll certainly hang out here and see where this is going. If things are happening around text/NLP, I may be able to contribute. I'd also like to mention that over in the UIMA incubator project, we have a sandbox project going that does Hidden Markov Model-based POS tagging, with promising results. Not sure if there can be any synergies there. I didn't see HMMs mentioned in the map/reduce paper and understand this stuff too little to know if they fit the Statistical Query model. --Thilo
