Re: Discossuon Of ML environment/MR, Mahout

Sebastian Schelter Mon, 11 Mar 2013 14:39:34 -0700

That's a tough question. I'd say we should only consider a) or c) as I
makes no sense to depend on some research prototype system that might
vanish once people get their funding cut.


On 11.03.2013 22:11, Dmitriy Lyubimov wrote:
> Ok,
> 
> So, getting back, what you think would be a good way to solve ALS-like
> issues within Mahout context?
> 
> I see just the following:
> 
> a) wait for Yarn + whatever bulk parallel environment built for it?
> 
> b) introduce adapters to syncrhonous or dynamic bulk parallel distributed
> environments -- if yes, which ones? Worth a try to step there? Is it a good
> idea to collaborate with non-Apache projects here?
> 
> c) do nothing (no good ALS in Mahout)?
> 
> I would happily explore b and open discussion on it if majority supported
> it. I guess I am fundamentally fine with c) too :)  I feel a) is not really
> an option and in a way is equivalent to c) since it involves unspecified
> amount of waiting for unspecified things.
> 
> 
> 
> On Mon, Mar 11, 2013 at 1:54 PM, Sebastian Schelter <[email protected]> wrote:
> 
>> I spent the last months working on the Stratosphere system, which is
>> developed by my group. It's a research prototype, but it's got so much
>> things that we would need.
>>
>> It extends the MapReduce model, for joins, e.g. there is a new operator
>> called 'Match' which lets you apply your user code to the result of an
>> equi-join. The nice thing is that the system automatically chooses an
>> efficient execution strategy for the join. Having something like this
>> production ready would save us so much code, as a lot of our
>> implementations consist of hand-coded joins.
>>
>>
>

Re: Discossuon Of ML environment/MR, Mahout

Reply via email to