That's a tough question. I'd say we should only consider a) or c) as I makes no sense to depend on some research prototype system that might vanish once people get their funding cut.
On 11.03.2013 22:11, Dmitriy Lyubimov wrote: > Ok, > > So, getting back, what you think would be a good way to solve ALS-like > issues within Mahout context? > > I see just the following: > > a) wait for Yarn + whatever bulk parallel environment built for it? > > b) introduce adapters to syncrhonous or dynamic bulk parallel distributed > environments -- if yes, which ones? Worth a try to step there? Is it a good > idea to collaborate with non-Apache projects here? > > c) do nothing (no good ALS in Mahout)? > > I would happily explore b and open discussion on it if majority supported > it. I guess I am fundamentally fine with c) too :) I feel a) is not really > an option and in a way is equivalent to c) since it involves unspecified > amount of waiting for unspecified things. > > > > On Mon, Mar 11, 2013 at 1:54 PM, Sebastian Schelter <[email protected]> wrote: > >> I spent the last months working on the Stratosphere system, which is >> developed by my group. It's a research prototype, but it's got so much >> things that we would need. >> >> It extends the MapReduce model, for joins, e.g. there is a new operator >> called 'Match' which lets you apply your user code to the result of an >> equi-join. The nice thing is that the system automatically chooses an >> efficient execution strategy for the join. Having something like this >> production ready would save us so much code, as a lot of our >> implementations consist of hand-coded joins. >> >> >
