+1 and agree with ssc's suggestion.


Sent from my iPhone

> On Apr 7, 2014, at 3:30 AM, Sebastian Schelter <s...@apache.org> wrote:
> 
> I agree that the state of the MR code is something that needs to be 
> addressed. There have been several attempts to rework/refactor it, but none 
> of them had a satisfactory result unfortunately.
> 
> I'm hearing that there is lack for a coherent vision for the future of 
> Mahout. Let me suggest a radical one.
> 
> - call the next release 0.10 not 1.0, as the latter implies a maturity which 
> does not reflect the radical changes I'm proposing
> 
> - move all the MR code to a new maven module, deprecate it and announce that 
> we delete it in the release after 0.11
> 
> - make the new DSL the heart of Mahout, aim for the following algorithms to 
> be implemented in the DSL as a new basis:
> 
> Collaborative Filtering:
> 
> * Cooccurrence-based recommender (work started in MAHOUT-1464)
> * ALS (work started in MAHOUT-1365)
> 
> Clustering:
> 
> * k-Means
> * Streaming k-Means
> 
> Classification:
> 
> * NaiveBayes (work started in MAHOUT-1493)
> * either Random Forests or an ensemble of SGD classifiers
> 
> Dimensionality Reduction / Topic Models
> 
> * SSVD (prototype in trunk)
> * PCA (prototype in trunk)
> * LDA
> 
> 
> - integrate Stratosphere / h20 as follows:
> 
> * the Stratosphere guys can choose to implement the physical operators of the 
> DSL to make our algos run on Stratosphere. If they do, this is great for 
> Mahout as it allows people to run code on different backends. If they don't, 
> we don't lose anything.
> 
> * a major point in porting the algorithms to the DSL would be to make the 
> input formats of all algorithms consistent. That would allow h20 to work off 
> the same inputs the scala DSL.
> 
> Let me know what you think.
> 
> -s
> 
> 
> 
> 
> 
>> On 04/06/2014 05:54 PM, Sean Owen wrote:
>> On Sun, Apr 6, 2014 at 4:16 PM, Andrew Musselman
>> <andrew.mussel...@gmail.com> wrote:
>>> Seems to me there has been a renewed effort to eat our broccoli, along with
>>> the other ideas people have been bringing on board.
>>> 
>>> What are you proposing to put in the board report?
>> 
>> I have not seen significant activity to unify or update the existing
>> code. It's still the same different chunks with different styles,
>> input/output, distributed/not, etc. The doc updates look very
>> positive. To be fair the task of really addressing the technical debt
>> is very large, so even making said dent would be a lot of work. A
>> clean-slate reboot therefore actually seems like a good plan, but
>> that's another question...
>> 
>> Concretely, in a board report, I personally would not agree with
>> representing the Spark or H2O work as an agreed future plan or
>> roadmap, right now. Being in the board report makes that impression,
>> as have recent articles/tweets I've seen, so it deserves care. That's
>> why I chimed in, maybe tilting at windmills.
>> 
>> From where I sit with customers, the overall impression is negative
>> among those that have tried to use the code, and usage has gone from
>> few to almost none. I doubt my sample is so different from the whole
>> user population. Much of it is consistency/quality, but some of it's
>> just an interest in non-M/R frameworks.
>> 
>> So, I think that current state and set of problems is far more
>> important to acknowledge in a board report than just mentioning some
>> future possibilities, and the latter was the impression I got of the
>> likely content. In fact, it makes the talk about large upcoming
>> possible changes make so much more sense.
> 

Reply via email to