Good questions Aditya, and awesome response Dustin et al. I'm back in, and trying to work my way through emails I missed while out.
The Meetup presentation referenced is available in full here. https://github.com/rawkintrevo/presentations/blob/master/Mahout%20Whats%20Next%20DFW%20Meetup.pdf Hopefully that will be a somewhat useful "structure" overview. To all watching, the write ups I have mentioned are a series of blog posts I intend to push out ASAP, specifically aimed at new users (to Aditya's point number 6). At the moment they are incomplete/poorly edited/unclear/possibly incorrect in spots. I promise to publish once they are clean! tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Mon, Apr 3, 2017 at 3:41 PM, dustin vanstee <dustinvans...@gmail.com> wrote: > Hi Aditya, I am new to the project myself so I can't comment on all your > questions but here are a few comments I have for you .. > > 1. High level structure of Mahout > Trevor gave a presentation at a meetup that had a nice architecture > diagram that shows the layers. > > Mainly its using the Samsara DSL to write backend agnostic algorithms. > Then let Mahout do the mapping and optimizations to the backend based on > what one you are using ... > > [image: Inline image 1] > 3. How does contribution of a new algorithm work in Mahout? When I was > reading the doc "Getting Started with Mahout" the example implemented the > Ordinary Least Squares Regression in Samsara, Mahout's DSL. > I had something different in my mind before reading the blog posts. I had > thought that I would be contributing the distributed algorithm to Mahout > from scratch, written in Scala and make it available as a package (which > users can import and use) to users who use Mahout. > > I think the idea is to let the backend engine figure out how to best > distribute the work. That said, when writing a binding to a particular > backend a lot of work is probably put into the best implementation of how > represent a DRM. > > 4. In general, is there a plan to contribute the algorithms in future using > Samsara only? If so, what will be the limitations and advantages of this > decision? I mean, the algorithms that will be a part of Mahout in the > future, is there a plan to write all of them in Samsara. > > I think thats where the sweet spot is ... backend agnostic code. > > > 6. What is expected of a newbie in the community? What is the learning > curve to become an active contributor to Mahout? Are there any specific > books / blog posts that I can read that will make the process easier? > > As a newbie, I think its participating in the building/testing of code > releases. Also working on some simple JIRAs. Based on my experience, > working on my first JIRA is helping me get more familiar with some small > aspects of the overall project. I think you will need to get good with > intelliJ to help you read/write/test code. I perused Trevors documents, > and all the writeups in the mahout website. Beyond that, just trying > things in code will help. > > > Sorry, don't have tons of answers myself, but his is what I have found out > so far. Hope that helps. > > > On Fri, Mar 31, 2017 at 7:47 PM, Aditya <adityasarma...@gmail.com> wrote: > >> Hi everyone, >> >> I've been talking with Trevor over email and he shared some documents with >> me. They contained content that he (along with a few others) were >> developing to make Mahout easily accessible to newbies like myself. >> >> I've gone through the planned blog posts titled "Why Mahout", "Getting >> Started with Mahout", "Algorithms Framework" and "Building Apache Mahout >> from Source" and I have to say, I've got a lot of questions. Since Trevor >> is on vacation and the deadline for final proposal submission is fast >> approaching, I thought I'll post my questions on the dev forum. >> >> So here goes the big list of my questions. I hope of those of you who were >> / are involved in the development of these blog posts will be able to help >> me. Some of the questions are vague / abstract, I suggest you answer them >> as if you're explaining it to a layman. >> >> 1. Could you elaborate to me the high-level structure of Mahout? >> >> 2. What are the plans in pipeline for Mahout's development in the months >> to >> come? >> >> 3. How does contribution of a new algorithm work in Mahout? When I was >> reading the doc "Getting Started with Mahout" the example implemented the >> Ordinary Least Squares Regression in Samsara, Mahout's DSL. >> I had something different in my mind before reading the blog posts. I had >> thought that I would be contributing the distributed algorithm to Mahout >> from scratch, written in Scala and make it available as a package (which >> users can import and use) to users who use Mahout. >> >> 4. In general, is there a plan to contribute the algorithms in future >> using >> Samsara only? If so, what will be the limitations and advantages of this >> decision? I mean, the algorithms that will be a part of Mahout in the >> future, is there a plan to write all of them in Samsara. >> >> 5. What are the building blocks of Mahout that enable the distributed >> processing? The blog post mentions the Distributed Row Matrix. Are there >> any other distributed data structures available? If not, won't the >> algorithms that can be a part of the Mahout framework in the future become >> limited? Meaning, algorithms that cannot be reduced to a Linear Algebra >> problem? >> >> 6. What is expected of a newbie in the community? What is the learning >> curve to become an active contributor to Mahout? Are there any specific >> books / blog posts that I can read that will make the process easier? >> >> 7. Also, if you could give me some background as to how the development of >> Mahout has been going on. Not the motivation / inspiration that led to >> Mahout's conception but something like, what work has gone on between the >> previous release and the current release candidate. >> >> 8. What was the high level motivation of developing Mahout's own DSL, >> Samsara? >> >> Regards, >> Aditya >> > >