Thanks Grant and everybody for the welcome, I am certainly thrilled about being able to participate in Mahout effort as a committer.
I am currently working for a small startup company called Inadco as an architect. I've been looking to a scalable solution for LSI among other things) for several past years, and renewed that effort when i joined Inadco. We hope LSI pipeline would help us to assess document similarities with some degree of addressing polisemy and synonymy. Mahout has an excellent foundation to bootstrap this process: pipelines to vectorize text documents with a custom stemmer/analyzer, compute tf/idfs, select bigrams/trigrams based on excellent log-likelihood method (which i think is based on Ted Dunning's 'Surprise and Coincidence' work). And all that capable running on a Hadoop infrastructure allowing to compact incredible amount of flops into unit of time. My contribution builds on top of that by introducing MapReduce-only Stochastic SVD implementation to the mix (MAHOUT-376, -593). This has not been a big priority for the company so far, but we ran and tested major steps of our LSI pipeline and i think we will see it thru to production in a matter of couple months or so, along with fold-in jobs and somewhat slightly "better-than-random-scanning" hbase-based vector space indexing. I think going forward we also have a great interest in dyadic regressions with cold starts (we are in a situation where side information is extremely sparse), as well as hierarchical document clustering. Hopefully, some of those future efforts may result in Mahout conributions. But that's company's roadmap, my personal roadmap of course does not have to depend on that too closely.:) Thanks. -Dmitriy On Sat, Feb 12, 2011 at 9:12 AM, Grant Ingersoll <[email protected]> wrote: > I am pleased to announce that the Mahout PMC has, in recognition of their > continued contributions to Mahout, elected Shannon Quinn and Dmitry Lyubimov > to be committers on the project. Please join me in giving a warm welcome! > > Dmitry and Shannon, it's customary for new committers to write a paragraph or > so of introduction about themselves, if you don't mind sharing a bit about > yourself and how you use Mahout. > > Thanks, > Grant
