Hi everyone, I've been talking with Trevor over email and he shared some documents with me. They contained content that he (along with a few others) were developing to make Mahout easily accessible to newbies like myself.
I've gone through the planned blog posts titled "Why Mahout", "Getting Started with Mahout", "Algorithms Framework" and "Building Apache Mahout from Source" and I have to say, I've got a lot of questions. Since Trevor is on vacation and the deadline for final proposal submission is fast approaching, I thought I'll post my questions on the dev forum. So here goes the big list of my questions. I hope of those of you who were / are involved in the development of these blog posts will be able to help me. Some of the questions are vague / abstract, I suggest you answer them as if you're explaining it to a layman. 1. Could you elaborate to me the high-level structure of Mahout? 2. What are the plans in pipeline for Mahout's development in the months to come? 3. How does contribution of a new algorithm work in Mahout? When I was reading the doc "Getting Started with Mahout" the example implemented the Ordinary Least Squares Regression in Samsara, Mahout's DSL. I had something different in my mind before reading the blog posts. I had thought that I would be contributing the distributed algorithm to Mahout from scratch, written in Scala and make it available as a package (which users can import and use) to users who use Mahout. 4. In general, is there a plan to contribute the algorithms in future using Samsara only? If so, what will be the limitations and advantages of this decision? I mean, the algorithms that will be a part of Mahout in the future, is there a plan to write all of them in Samsara. 5. What are the building blocks of Mahout that enable the distributed processing? The blog post mentions the Distributed Row Matrix. Are there any other distributed data structures available? If not, won't the algorithms that can be a part of the Mahout framework in the future become limited? Meaning, algorithms that cannot be reduced to a Linear Algebra problem? 6. What is expected of a newbie in the community? What is the learning curve to become an active contributor to Mahout? Are there any specific books / blog posts that I can read that will make the process easier? 7. Also, if you could give me some background as to how the development of Mahout has been going on. Not the motivation / inspiration that led to Mahout's conception but something like, what work has gone on between the previous release and the current release candidate. 8. What was the high level motivation of developing Mahout's own DSL, Samsara? Regards, Aditya