Of course, FlumeJava (or the open source version Plume) will let this same kind of code be written for map-reduce and plausible could generate code for spark-like execution or Giraph execution.
That would let us stay with java at the cost of a few extra lines of code when expressing a closure. On Mon, Sep 5, 2011 at 8:02 AM, Jake Mannix <[email protected]> wrote: > > Another interesting model is that of Spark. I can well imagine that much > > of > > what we do could be replaced with very small Spark programs which could > be > > composed much more easily than our current map-reduce command-line stuff > > could be glued together. Lots of codes should experience two orders of > > magnitude speedup from the use of these alternative systems. > > > > This is my impression too. The more I play with Spark, the more it looks > like > "the Right Paradigm" for this kind of computation: how many years has I > been > complaining that all I've ever wanted from Hadoop (and/or Mahout) is to be > able > to say something like: > > vectors = load("hdfs://mydataFile"); > vectors.map(new Function<Vector, Vector>() { > Vector apply(Vector in) { return in.normailze(1); }) > .filter(new Predicate<Vector>() { > boolean apply(Vector in) { return > in.numNonDefaultValues() < 1000; }) > .reduce(new Function<Pair<Vector, Vector>, Vector>() { > Vector apply(Pair<Vector, Vector> pair) { return > pair.getFirst().plus(pair.getSecond()); }); > > Spark lets you do exactly this kind of strongly typed mixed OO+functional > thinking > (except without the verbosity of Java, and with proper closures).
