Ok, So, getting back, what you think would be a good way to solve ALS-like issues within Mahout context?
I see just the following: a) wait for Yarn + whatever bulk parallel environment built for it? b) introduce adapters to syncrhonous or dynamic bulk parallel distributed environments -- if yes, which ones? Worth a try to step there? Is it a good idea to collaborate with non-Apache projects here? c) do nothing (no good ALS in Mahout)? I would happily explore b and open discussion on it if majority supported it. I guess I am fundamentally fine with c) too :) I feel a) is not really an option and in a way is equivalent to c) since it involves unspecified amount of waiting for unspecified things. On Mon, Mar 11, 2013 at 1:54 PM, Sebastian Schelter <[email protected]> wrote: > I spent the last months working on the Stratosphere system, which is > developed by my group. It's a research prototype, but it's got so much > things that we would need. > > It extends the MapReduce model, for joins, e.g. there is a new operator > called 'Match' which lets you apply your user code to the result of an > equi-join. The nice thing is that the system automatically chooses an > efficient execution strategy for the join. Having something like this > production ready would save us so much code, as a lot of our > implementations consist of hand-coded joins. > >
