Re: Mahout contributions

Khurrum Nasim Thu, 28 Apr 2016 14:48:54 -0700

I agree with Andrew.   Mahout should remain indigenous.


Prakash - you may want to create your own project on github using the mahout 
library.   


> On Apr 28, 2016, at 5:43 PM, Andrew Palumbo <[email protected]> wrote:
> 
> I don't  think that this sort of of integration work would be a good fit 
> directly to the Mahout project.  Mahout is more about math, algorithms and an 
> environment to develop algorithms.  We stay away from direct platform 
> integration.  In the past we did have some elasticsearch/mahout integration 
> work that is not in the code base for this exact reason.  I would suggest 
> that better places to contribute something like this may be: PIO 
> (https://prediction.io/), or even directly as a package for spark 
> http://spark-packages.org/ .
> 
> Recent projects integrating Mahout have recently been added to PIO: 
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation.
>   
> 
> I think that the project that you are proposing would be a better fit there.
> 
> Thanks,
> 
> Andy
> 
> 
> ________________________________________
> From: Saikat Kanjilal <[email protected]>
> Sent: Thursday, April 28, 2016 1:50 PM
> To: [email protected]
> Subject: Re: Mahout contributions
> 
> I want to start with social data as an example, for example data returned 
> from FB graph API as well user Twitter data, will send some samples later if 
> you're interested.
> 
> Sent from my iPhone
> 
>> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <[email protected]> wrote:
>> 
>> 
>> What type of JSON payload size are we talking about here ?
>> 
>>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <[email protected]> wrote:
>>> 
>>> Because EL gives you the visualization and non Lucene type query constructs 
>>> as well and also that it already has a rest API that I plan on tying into 
>>> mahout.  I plan on wrapping some of the clustering algorithms that I 
>>> implement using Mahout and Spark as a service which can then make calls 
>>> into other services (namely elasticsearch and neo4j graph service).
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <[email protected]> 
>>>> wrote:
>>>> 
>>>> @Saikat- why use EL instead of Lucene directly.
>>>> 
>>>> 
>>>> 
>>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <[email protected]> wrote:
>>>>> 
>>>>> This is great information thank you, based on this recommendation I won't 
>>>>> create a JIRA but start work on my project and when the code approaches 
>>>>> the percentages you are describing I will create the appropriate JIRA's 
>>>>> and put together a proposal to send to the list, sound ok?  Based on your 
>>>>> latest updates to the wiki i will work on a handful of the clustering 
>>>>> algorithms since I see that the Spark implementations for these are not 
>>>>> yet complete.
>>>>> Thank you again
>>>>> 
>>>>>> From: [email protected]
>>>>>> To: [email protected]
>>>>>> Subject: Re: Mahout contributions
>>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>>>>> 
>>>>>> Saikat,
>>>>>> 
>>>>>> One other thing that I should say is that you do not need clearance or 
>>>>>> input from the committers to begin work on your project, and the 
>>>>>> interest can and should come from the community as a whole. You can 
>>>>>> write proposal as you've done, and if you don't see any "+1"s or 
>>>>>> responses from the community at whole with in a few days, you may want 
>>>>>> to explain in more detail, give examples and use cases.  If you are 
>>>>>> still not seeing +1s or any responses from others then I think you can 
>>>>>> assume that there may not be interest; this is usually how things work.
>>>>>> 
>>>>>> However if its something that your passionate about and you feel like 
>>>>>> you can deliver this should not to stop you.  People do not always read 
>>>>>> the dev@ emails or have time to respond.  You can still move forward 
>>>>>> with your proposed contribution by following the steps laid out in my 
>>>>>> previous email; follow the protocol at:
>>>>>> 
>>>>>> http://mahout.apache.org/developers/how-to-contribute.html
>>>>>> 
>>>>>> and create a JIRA.  When you have reached a significant amount of 
>>>>>> completion (around 70-80%), open a PR for review, this way you can 
>>>>>> explain in more detail.
>>>>>> 
>>>>>> But please realize that when you open a JIRA for a new issue there is 
>>>>>> some expectation of a commitment on your part to complete it.
>>>>>> 
>>>>>> For example, I am currently investigating some new plotting features.  I 
>>>>>> have spent a good deal of time this week and last already and am even 
>>>>>> mocking up code as a sketch of what may become an implementation before 
>>>>>> I open a "New Feature" JIRA for it.
>>>>>> 
>>>>>> My point is absolutely not to discourage you or anybody else from 
>>>>>> opening JIRAs for new features, rather to let you know that when you 
>>>>>> open an JIRA for a new issue, It tells others that your are working on 
>>>>>> it, and thus may discourage another with a similar idea to contribute 
>>>>>> this feature.  So it is best to open it once you've begun your work and 
>>>>>> are committed to it.
>>>>>> 
>>>>>> Andy
>>>>>> 
>>>>>> ________________________________________
>>>>>> From: Saikat Kanjilal <[email protected]>
>>>>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>>>>> To: [email protected]
>>>>>> Subject: RE: Mahout contributions
>>>>>> 
>>>>>> Andrew,Thank you very much for your input, I actually want to start a 
>>>>>> new set of JIRAs, here's what I want to work on, I want to build a 
>>>>>> framework that ties together search/visualization capability with some 
>>>>>> machine learning algorithms, so essentially think of it as tying in 
>>>>>> elasticsearch and kibana  into mahout , the user can search for their 
>>>>>> data with elasticsearch and for deeper analysis on that data they can 
>>>>>> feed that data into one or more mahout backends for analysis.  Another 
>>>>>> interesting tie in might be to hack kibana to render ggplot like 
>>>>>> graphics based on the output of mahout algorithms (assuming this can be 
>>>>>> a kibana plugin).
>>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if 
>>>>>> there's interest in this initiative.  The tool will bring together the 
>>>>>> ELK stack with dynamic machine learning algorithms.  I can go into a lot 
>>>>>> more detail around use cases if there's enough interest.
>>>>>> Looking forward to your and other committers input.Thanks
>>>>>> 
>>>>>>> From: [email protected]
>>>>>>> To: [email protected]
>>>>>>> Subject: Re: Mahout contributions
>>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>>>>> 
>>>>>>> Hello Saikat,
>>>>>>> 
>>>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not 
>>>>>>> recommend without a strong knowledge of the codebase, and #5 is now 
>>>>>>> deprecated.  (I've just updated the algorithms grid to reflect this).  
>>>>>>> The algorithms page includes both algorithms implemented in the 
>>>>>>> math-scala library and algorithms which have CLI drivers written for 
>>>>>>> them.
>>>>>>> 
>>>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>>>>> 
>>>>>>> And please note that per that documentation, it is in everybody's best 
>>>>>>> interest to keep messages on list, contacting committers directly is 
>>>>>>> discouraged.
>>>>>>> 
>>>>>>> The best way to contribute (if you have not found a new bug or issue) 
>>>>>>> would be for you to pick a single open issue in the mahout JIRA which 
>>>>>>> is not already assigned, and start work on it.  When your work is ready 
>>>>>>> for review, just open up a PR and the committers will review it.  
>>>>>>> Please note that if you do pick up an issue to work on, we do expect 
>>>>>>> some amount of responsibility and reliability and tangible amount of 
>>>>>>> satisfactory work since once you've marked a JIRA as something you're 
>>>>>>> working on, others will pass on it.
>>>>>>> 
>>>>>>> Another good way to contribute would be to look for enhancements that 
>>>>>>> could make to existing code not necessarily open JIRAs that need to be 
>>>>>>> assigned to you.  For example please see the recent contribution and 
>>>>>>> workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>>>>> 
>>>>>>> If you have something new that you'd like to implement, simply start a 
>>>>>>> new JIRA issue and begin work on it.  In this case, when you have some 
>>>>>>> code that is ready for review,  you can simply open up a PR for it and 
>>>>>>> committers will review it.  For new implementations, we generally say 
>>>>>>> that you should do this when you are at least 70-80% finished with your 
>>>>>>> coding.
>>>>>>> 
>>>>>>> Thank You,
>>>>>>> 
>>>>>>> Andy
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________________
>>>>>>> From: Saikat Kanjilal <[email protected]>
>>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>>>>> To: [email protected]
>>>>>>> Subject: RE: Mahout contributions
>>>>>>> 
>>>>>>> Hello,Following up on my last email with more specifics,  I've looked 
>>>>>>> through the wiki 
>>>>>>> (https://mahout.apache.org/users/basics/algorithms.html) and I'm 
>>>>>>> interested in implementing the one or more of the following algorithms 
>>>>>>> with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive 
>>>>>>> Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors 
>>>>>>> from Text 5) Lucene integration.
>>>>>>> Had a few questions:1) Which of these should I start with and where is 
>>>>>>> there the greatest need?2) Should I fork the repo and create branches 
>>>>>>> for the each of the above implementations?3) Should I go ahead and 
>>>>>>> create some JIRAs for these?
>>>>>>> Would love to have some pointers to get started?Regards
>>>>>>> 
>>>>>>> From: [email protected]
>>>>>>> To: [email protected]
>>>>>>> Subject: Mahout contributions
>>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Hello Committers,I was looking through the current jira tickets and was 
>>>>>>> wondering if there's a particular area of Mahout that needs some more 
>>>>>>> help than others, should I focus on contributing some algorithms usign 
>>>>>>> DSL or Samsara related efforts, I've finally got some bandwidth to do 
>>>>>>> some work and would love some guidance before assigning myself some 
>>>>>>> tickets.Regards
>>

Re: Mahout contributions

Reply via email to