I think the Job structure that we have is a major problem because it assumes input and output to files.
We don’t have to mix distributed and local code to produce a distributed matrix that is manipulated locally with no reference to the cluster. I also think that discussion in a vacuum won’t go forward. I will work with Anand and Cliff to produce a prototype of what I mean for critique On Apr 1, 2014, at 1:16 PM, Frank Scholten <[email protected]> wrote: I also like Anand's idea of creating an h2o alternative of a Hadoop job. I do like to see this being implemented as a Java bean with a separate CLI driver so class it is easy to use in Java. Current Mahout jobs have to called via main methods with String arrays. See the lucene2seq as an example of the bean config idea.
