Haha. Part of the abuse is "Google envy". (Google "Sigmund Freud" to fully understand this.)
I'm finding inherent difficulty in documenting map/reduce code, and assimilating an existing job. Haven't seen a "UML for Map/Reduce" yet; Hamake is the cleanest "everything in one file" description, and it only stores half of what's going on. Mahout's "in-memory" code is all single-threaded, and is bifurcated from the map/reduce versions. A few places have custom multi-threading shoehorned in. You can't buy a stationary single-processor computer. We bought an 8-core server 1.5 years ago for under 5 grand. We can't easily write multi-processor java for it. If Mahout wants to stay M/R focused it could use an in-memory M/R executor as the "in-memory" option. Several systems (including the QT graphics framework!) include such a beast. It's not very hard. The big overhead is sorting, and you often don't care. https://github.com/LanceNorskog/parallel/tree/master/project/src/java/parallel/littlemr https://github.com/LanceNorskog/parallel/blob/master/project/test/java/parallel/littlemr/TestFullPass.java About m/r's future: Riak supports doing a map/reduce job during a query. That is, m/r is a distributed version of the classic DB stored procedure; the query happens between the DB and the (multiple, parallel) clients. This is a natural place for m/r, and it may live on in that context after all Google envy fades away. On Mon, Sep 5, 2011 at 2:13 AM, Sean Owen <[email protected]> wrote: > My high-level view is that Hadoop was very excellent for its intended use > case, and that because of this, people have abused it to do things quite > unlike what it was designed for. It's amazing that a glorified logs > processing framework could do anything like machine learning well. Mahout > embodies that interesting struggle. > > I can only believe that most any of the "next gen" frameworks discussed > here, which are necessarily more general-purpose, will be better for things > like machine learning. I am not so interesting in MR 2.0 -- nothing wrong > with it just not something better conceptually for machine learning. I like > projects like Ciel from MS Research -- simply more general purpose graph- > and data-flow-oriented frameworks. > > I personally believe that while Mahout *could* be anything, that it's > reached about the level of scope it can possibly sustain given the amount > of > effort coming in, in trying to do something interesting on top of > MapReduce. > This will be useful for a couple years to come yet. > > That is to say: I think it will be interesting to explore another > machine-learning-at-scale project in 2 years or so on top of one of these > next-gen frameworks. > > (Was that the question?) > -- Lance Norskog [email protected]
