Agreed. Frank's and other's suggestion of an end-to-end _early outtake is
what we are thinking. A bit more of progress has been made on that front.
Having a straw man (possibly throw-away first cut) will help flesh out
discussion and give clarity. More than a few ways to skin the cat. Then
software evolves rapidly.

To be fair, A bit of white boarding over a hangout will help shortly after.
The feedback and fears are important to hear. Aspirations, nuances of the
API use, requirements and hidden contracts in the code.

Let's also rally on the amazing momentum we are building around the project
constructively so the community only grows with the infusion. Having this
kind of talent in our Mahout community is going to transform our project &
bring great hackers to embrace the Scalable Machine Learning vision.

[ We are talking about some hall-of-fame talent here: Anand Avati authored
close to million lines of code past decade in open source in his last
project at Red Hat; He represents Red hat. Cliff Click is synonymous with
high-performance production grade JITs, JVMs and H2O.] And other great ones
to come and new crazy new ones to welcome into this endeavor.  They are
only just getting started: Imagine weaving of the great distributed tree
algorithms, deep learning, ddply() from R, etc into the core of Mahout
experience! It's going to be a great end result.

Yea, there be dragons,
Them dragons shall be slayed -
cheers, Sri


On Tue, Apr 1, 2014 at 4:07 PM, Ted Dunning <[email protected]> wrote:

> I think the Job structure that we have is a major problem because it
> assumes input and output to files.
>
> We don't have to mix distributed and local code to produce a distributed
> matrix that is manipulated locally with no reference to the cluster.
>
> I also think that discussion in a vacuum won't go forward.  I will work
> with Anand and Cliff to produce a prototype of what I mean for critique
>
> On Apr 1, 2014, at 1:16 PM, Frank Scholten <[email protected]> wrote:
>
> I also like Anand's idea of creating an h2o alternative of a Hadoop job. I
> do like to see this being implemented as a Java bean with a separate CLI
> driver so class it is easy to use in Java. Current Mahout jobs have to
> called via main methods with String arrays. See the lucene2seq as an
> example of the bean config idea.
>



-- 
ceo & co-founder, 0 <http://www.0xdata.com/>*x*data Inc
+1-408.316.8192

Reply via email to