Re: Discossuon Of ML environment/MR, Mahout

Sebastian Schelter Tue, 12 Mar 2013 00:47:37 -0700

Hi Ted,

Sounds good to me.


If the next release of Giraph looks promising (I had some problems with
0.1 and the current trunk), I'd definitely like to start an
experimental, possibly cut-away-later integration.

I never worked with Spark, so I'm not sure whether that should also be
integrated.

On 12.03.2013 01:01, Ted Dunning wrote:
> Why not (b) if (b) implies Giraph (which seems to have some momentum) or
> Spark (which has its own momentum and was originally designed to support
> machine learning anyway)?
> 
> Also, why not (b) if we agree now that it is an experiment that will will
> cut away if it leads to a mess.
> 
> On Mon, Mar 11, 2013 at 2:39 PM, Sebastian Schelter <[email protected]> wrote:
> 
>> That's a tough question. I'd say we should only consider a) or c) as I
>> makes no sense to depend on some research prototype system that might
>> vanish once people get their funding cut.
>>
>> On 11.03.2013 22:11, Dmitriy Lyubimov wrote:
>>> Ok,
>>>
>>> So, getting back, what you think would be a good way to solve ALS-like
>>> issues within Mahout context?
>>>
>>> I see just the following:
>>>
>>> a) wait for Yarn + whatever bulk parallel environment built for it?
>>>
>>> b) introduce adapters to syncrhonous or dynamic bulk parallel distributed
>>> environments -- if yes, which ones? Worth a try to step there? Is it a
>> good
>>> idea to collaborate with non-Apache projects here?
>>>
>>> c) do nothing (no good ALS in Mahout)?
>>>
>>> I would happily explore b and open discussion on it if majority supported
>>> it. I guess I am fundamentally fine with c) too :)  I feel a) is not
>> really
>>> an option and in a way is equivalent to c) since it involves unspecified
>>> amount of waiting for unspecified things.
>>>
>>>
>>>
>>> On Mon, Mar 11, 2013 at 1:54 PM, Sebastian Schelter <[email protected]>
>> wrote:
>>>
>>>> I spent the last months working on the Stratosphere system, which is
>>>> developed by my group. It's a research prototype, but it's got so much
>>>> things that we would need.
>>>>
>>>> It extends the MapReduce model, for joins, e.g. there is a new operator
>>>> called 'Match' which lets you apply your user code to the result of an
>>>> equi-join. The nice thing is that the system automatically chooses an
>>>> efficient execution strategy for the join. Having something like this
>>>> production ready would save us so much code, as a lot of our
>>>> implementations consist of hand-coded joins.
>>>>
>>>>
>>>
>>
>>
>

Re: Discossuon Of ML environment/MR, Mahout

Reply via email to