Hi Aditya, I am new to the project myself so I can't comment on all your questions but here are a few comments I have for you ..
1. High level structure of Mahout Trevor gave a presentation at a meetup that had a nice architecture diagram that shows the layers. Mainly its using the Samsara DSL to write backend agnostic algorithms. Then let Mahout do the mapping and optimizations to the backend based on what one you are using ... [image: Inline image 1] 3. How does contribution of a new algorithm work in Mahout? When I was reading the doc "Getting Started with Mahout" the example implemented the Ordinary Least Squares Regression in Samsara, Mahout's DSL. I had something different in my mind before reading the blog posts. I had thought that I would be contributing the distributed algorithm to Mahout from scratch, written in Scala and make it available as a package (which users can import and use) to users who use Mahout. I think the idea is to let the backend engine figure out how to best distribute the work. That said, when writing a binding to a particular backend a lot of work is probably put into the best implementation of how represent a DRM. 4. In general, is there a plan to contribute the algorithms in future using Samsara only? If so, what will be the limitations and advantages of this decision? I mean, the algorithms that will be a part of Mahout in the future, is there a plan to write all of them in Samsara. I think thats where the sweet spot is ... backend agnostic code. 6. What is expected of a newbie in the community? What is the learning curve to become an active contributor to Mahout? Are there any specific books / blog posts that I can read that will make the process easier? As a newbie, I think its participating in the building/testing of code releases. Also working on some simple JIRAs. Based on my experience, working on my first JIRA is helping me get more familiar with some small aspects of the overall project. I think you will need to get good with intelliJ to help you read/write/test code. I perused Trevors documents, and all the writeups in the mahout website. Beyond that, just trying things in code will help. Sorry, don't have tons of answers myself, but his is what I have found out so far. Hope that helps. On Fri, Mar 31, 2017 at 7:47 PM, Aditya <adityasarma...@gmail.com> wrote: > Hi everyone, > > I've been talking with Trevor over email and he shared some documents with > me. They contained content that he (along with a few others) were > developing to make Mahout easily accessible to newbies like myself. > > I've gone through the planned blog posts titled "Why Mahout", "Getting > Started with Mahout", "Algorithms Framework" and "Building Apache Mahout > from Source" and I have to say, I've got a lot of questions. Since Trevor > is on vacation and the deadline for final proposal submission is fast > approaching, I thought I'll post my questions on the dev forum. > > So here goes the big list of my questions. I hope of those of you who were > / are involved in the development of these blog posts will be able to help > me. Some of the questions are vague / abstract, I suggest you answer them > as if you're explaining it to a layman. > > 1. Could you elaborate to me the high-level structure of Mahout? > > 2. What are the plans in pipeline for Mahout's development in the months to > come? > > 3. How does contribution of a new algorithm work in Mahout? When I was > reading the doc "Getting Started with Mahout" the example implemented the > Ordinary Least Squares Regression in Samsara, Mahout's DSL. > I had something different in my mind before reading the blog posts. I had > thought that I would be contributing the distributed algorithm to Mahout > from scratch, written in Scala and make it available as a package (which > users can import and use) to users who use Mahout. > > 4. In general, is there a plan to contribute the algorithms in future using > Samsara only? If so, what will be the limitations and advantages of this > decision? I mean, the algorithms that will be a part of Mahout in the > future, is there a plan to write all of them in Samsara. > > 5. What are the building blocks of Mahout that enable the distributed > processing? The blog post mentions the Distributed Row Matrix. Are there > any other distributed data structures available? If not, won't the > algorithms that can be a part of the Mahout framework in the future become > limited? Meaning, algorithms that cannot be reduced to a Linear Algebra > problem? > > 6. What is expected of a newbie in the community? What is the learning > curve to become an active contributor to Mahout? Are there any specific > books / blog posts that I can read that will make the process easier? > > 7. Also, if you could give me some background as to how the development of > Mahout has been going on. Not the motivation / inspiration that led to > Mahout's conception but something like, what work has gone on between the > previous release and the current release candidate. > > 8. What was the high level motivation of developing Mahout's own DSL, > Samsara? > > Regards, > Aditya >