Re: Apache Giraph?

Lance Norskog Mon, 05 Sep 2011 03:48:17 -0700

Haha. Part of the abuse is "Google envy". (Google "Sigmund Freud" to fully
understand this.)

I'm finding inherent difficulty in documenting map/reduce code, and
assimilating an existing job. Haven't seen a "UML for Map/Reduce" yet;
Hamake is the cleanest "everything in one file" description, and it only
stores half of what's going on.

Mahout's "in-memory" code is all single-threaded, and is bifurcated from the
map/reduce versions. A few places have custom multi-threading shoehorned in.
You can't buy a stationary single-processor computer. We bought an 8-core
server 1.5 years ago for under 5 grand. We can't easily write
multi-processor java for it. If Mahout wants to stay M/R focused it could
use an in-memory M/R executor as the "in-memory" option. Several systems
(including the QT graphics framework!) include such a beast. It's not very
hard. The big overhead is sorting, and you often don't care.

https://github.com/LanceNorskog/parallel/tree/master/project/src/java/parallel/littlemr
https://github.com/LanceNorskog/parallel/blob/master/project/test/java/parallel/littlemr/TestFullPass.java

About m/r's future: Riak supports doing a map/reduce job during a query.
That is, m/r is a distributed version of the classic DB stored procedure;
the query happens between the DB and the (multiple, parallel) clients. This
is a natural place for m/r, and it may live on in that context after all
Google envy fades away.

On Mon, Sep 5, 2011 at 2:13 AM, Sean Owen <[email protected]> wrote:

> My high-level view is that Hadoop was very excellent for its intended use
> case, and that because of this, people have abused it to do things quite
> unlike what it was designed for. It's amazing that a glorified logs
> processing framework could do anything like machine learning well. Mahout
> embodies that interesting struggle.
>
> I can only believe that most any of the "next gen" frameworks discussed
> here, which are necessarily more general-purpose, will be better for things
> like machine learning. I am not so interesting in MR 2.0 -- nothing wrong
> with it just not something better conceptually for machine learning. I like
> projects like Ciel from MS Research -- simply more general purpose graph-
> and data-flow-oriented frameworks.
>
> I personally believe that while Mahout *could* be anything, that it's
> reached about the level of scope it can possibly sustain given the amount
> of
> effort coming in, in trying to do something interesting on top of
> MapReduce.
> This will be useful for a couple years to come yet.
>
> That is to say: I think it will be interesting to explore another
> machine-learning-at-scale project in 2 years or so on top of one of these
> next-gen frameworks.
>
> (Was that the question?)
>

-- 
Lance Norskog
[email protected]

Re: Apache Giraph?

Reply via email to