Good questions Aditya, and awesome response Dustin et al.

I'm back in, and trying to work my way through emails I missed while out.

The Meetup presentation referenced is available in full here.
https://github.com/rawkintrevo/presentations/blob/master/Mahout%20Whats%20Next%20DFW%20Meetup.pdf

Hopefully that will be a somewhat useful "structure" overview.

To all watching, the write ups I have mentioned are a series of blog posts
I intend to push out ASAP, specifically aimed at new users (to Aditya's
point number 6).  At the moment they are incomplete/poorly
edited/unclear/possibly incorrect in spots.  I promise to publish once they
are clean!

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, Apr 3, 2017 at 3:41 PM, dustin vanstee <dustinvans...@gmail.com>
wrote:

> Hi Aditya, I am new to the project myself so I can't comment on all your
> questions but here are a few comments I have for you ..
>
> 1. High level structure of Mahout
> Trevor gave a presentation at a meetup that had a nice architecture
> diagram that shows the layers.
>
> Mainly its using the Samsara DSL to write backend agnostic algorithms.
> Then let Mahout do the mapping and optimizations to the backend based on
> what one you are using ...
>
> [image: Inline image 1]
> 3. How does contribution of a new algorithm work in Mahout? When I was
> reading the doc "Getting Started with Mahout" the example implemented the
> Ordinary Least Squares Regression in Samsara, Mahout's DSL.
> I had something different in my mind before reading the blog posts. I had
> thought that I would be contributing the distributed algorithm to Mahout
> from scratch, written in Scala and make it available as a package (which
> users can import and use) to users who use Mahout.
>
> I think the idea is to let the backend engine figure out how to best
> distribute the work.  That said, when writing a binding to a particular
> backend a lot of work is probably put into the best implementation of how
> represent a DRM.
>
> 4. In general, is there a plan to contribute the algorithms in future using
> Samsara only? If so, what will be the limitations and advantages of this
> decision? I mean, the algorithms that will be a part of Mahout in the
> future, is there a plan to write all of them in Samsara.
>
> I think thats where the sweet spot is ... backend agnostic code.
>
>
> 6. What is expected of a newbie in the community? What is the learning
> curve to become an active contributor to Mahout? Are there any specific
> books / blog posts that I can read that will make the process easier?
>
> As a newbie, I think its participating in the building/testing of code
> releases.  Also working on some simple JIRAs.   Based on my experience,
> working on my first JIRA is helping me get more familiar with some small
> aspects of the overall project.  I think you will need to get good with
> intelliJ to help you read/write/test code.   I perused Trevors documents,
> and all the writeups in the mahout website.   Beyond that, just trying
> things in code will help.
>
>
> Sorry, don't have tons of answers myself, but his is what I have found out
> so far.  Hope that helps.
>
>
> On Fri, Mar 31, 2017 at 7:47 PM, Aditya <adityasarma...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I've been talking with Trevor over email and he shared some documents with
>> me. They contained content that he (along with a few others) were
>> developing to make Mahout easily accessible to newbies like myself.
>>
>> I've gone through the planned blog posts titled "Why Mahout", "Getting
>> Started with Mahout", "Algorithms Framework" and "Building Apache Mahout
>> from Source" and I have to say, I've got a lot of questions. Since Trevor
>> is on vacation and the deadline for final proposal submission is fast
>> approaching, I thought I'll post my questions on the dev forum.
>>
>> So here goes the big list of my questions. I hope of those of you who were
>> / are involved in the development of these blog posts will be able to help
>> me. Some of the questions are vague / abstract, I suggest you answer them
>> as if you're explaining it to a layman.
>>
>> 1. Could you elaborate to me the high-level structure of Mahout?
>>
>> 2. What are the plans in pipeline for Mahout's development in the months
>> to
>> come?
>>
>> 3. How does contribution of a new algorithm work in Mahout? When I was
>> reading the doc "Getting Started with Mahout" the example implemented the
>> Ordinary Least Squares Regression in Samsara, Mahout's DSL.
>> I had something different in my mind before reading the blog posts. I had
>> thought that I would be contributing the distributed algorithm to Mahout
>> from scratch, written in Scala and make it available as a package (which
>> users can import and use) to users who use Mahout.
>>
>> 4. In general, is there a plan to contribute the algorithms in future
>> using
>> Samsara only? If so, what will be the limitations and advantages of this
>> decision? I mean, the algorithms that will be a part of Mahout in the
>> future, is there a plan to write all of them in Samsara.
>>
>> 5. What are the building blocks of Mahout that enable the distributed
>> processing? The blog post mentions the Distributed Row Matrix. Are there
>> any other distributed data structures available? If not, won't the
>> algorithms that can be a part of the Mahout framework in the future become
>> limited? Meaning, algorithms that cannot be reduced to a Linear Algebra
>> problem?
>>
>> 6. What is expected of a newbie in the community? What is the learning
>> curve to become an active contributor to Mahout? Are there any specific
>> books / blog posts that I can read that will make the process easier?
>>
>> 7. Also, if you could give me some background as to how the development of
>> Mahout has been going on. Not the motivation / inspiration that led to
>> Mahout's conception but something like, what work has gone on between the
>> previous release and the current release candidate.
>>
>> 8. What was the high level motivation of developing Mahout's own DSL,
>> Samsara?
>>
>> Regards,
>> Aditya
>>
>
>

Reply via email to