Re: [jira] Commented: (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2

Ted Dunning Sat, 06 Nov 2010 09:13:44 -0700

Remember Flume != FlumeJava.

Flume is Cloudera's semi-proprietary ETL system.

FlumeJava is a high level API for creating map-reduce programs in Java.  The
level of abstraction is similar to Pig.

Plume is an open source project I started to clone FlumeJava by filling in
the details omitted from the Google paper.  As an
example of how high level Plume is, word count in raw map-reduce is >200
lines of code.  In Plume, it is about 20 and you
can't tell which version of Hadoop, if any, your code is running on.

On Fri, Nov 5, 2010 at 7:09 AM, Grant Ingersoll <[email protected]> wrote:

> The Plume/Flume stuff seems promising for helping with that as well as
> giving some other benefits, but that relies on us having an open source
> version of Flume (which Ted and others have started).  I don't know that it
> is all that practical in short term and I'm not proposing any rewrites at
> this point, but we should consider it as working at that layer might allow
> the ability to plugin different backends that are better performing given
> certain setups (local, small cluster, large cluster).  Such a bit of
> insulation might allow us to plug in other capabilities as well.  One of the
> things Hadoop has spawned is a whole lot more interest in these kind of
> capabilities and I fully expect to see new/related paradigms coming out.
>  Obviously, we aren't just going to jump on anything, but if we can think
> about ways we might be able to plug them in.  Thoughts?

Re: [jira] Commented: (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2

Reply via email to