Of course, FlumeJava (or the open source version Plume) will let this same
kind of code be written for map-reduce and plausible could generate code for
spark-like execution or Giraph execution.

That would let us stay with java at the cost of a few extra lines of code
when expressing a closure.

On Mon, Sep 5, 2011 at 8:02 AM, Jake Mannix <[email protected]> wrote:

> > Another interesting model is that of Spark.  I can well imagine that much
> > of
> > what we do could be replaced with very small Spark programs which could
> be
> > composed much more easily than our current map-reduce command-line stuff
> > could be glued together.  Lots of codes should experience two orders of
> > magnitude speedup from the use of these alternative systems.
> >
>
>   This is my impression too.  The more I play with Spark, the more it looks
> like
> "the Right Paradigm" for this kind of computation: how many years has I
> been
> complaining that all I've ever wanted from Hadoop (and/or Mahout) is to be
> able
> to say something like:
>
>  vectors = load("hdfs://mydataFile");
>  vectors.map(new Function<Vector, Vector>() {
>                       Vector apply(Vector in) { return in.normailze(1); })
>             .filter(new Predicate<Vector>() {
>                       boolean apply(Vector in) { return
> in.numNonDefaultValues() < 1000; })
>            .reduce(new Function<Pair<Vector, Vector>, Vector>() {
>                       Vector apply(Pair<Vector, Vector> pair) { return
> pair.getFirst().plus(pair.getSecond()); });
>
>  Spark lets you do exactly this kind of strongly typed mixed OO+functional
> thinking
> (except without the verbosity of Java, and with proper closures).

Reply via email to