Re: Discossuon Of ML environment/MR, Mahout

Sebastian Schelter Mon, 11 Mar 2013 13:24:13 -0700

Ideally, as implementor of a machine learning library wouldn't want to
think about how to most efficiently execute joins. It's data dependent
anyway in most cases. You would want to have an optimizer similar to the
ones used in databases that takes your map reduce data flow and figures
out the best way to execute it.


On 11.03.2013 21:16, Ted Dunning wrote:
> Kinda sorta..
> 
> You can defeat most of the sort if you want to just hash things to buckets.
> 
> On Mon, Mar 11, 2013 at 12:01 PM, Dmitriy Lyubimov <[email protected]>wrote:
> 
>> Sort component adds log to
>> the asymptotic complexity, whereas it is clear that any streaming merge
>> algorithm just wouldn't need to do sort and capitalize on the structure we
>> already know . (sure, you can do it map-side with a specific streaming join
>> logic but that would not be pure MR but rather some map task acrobatics).
>>
>

Re: Discossuon Of ML environment/MR, Mahout

Reply via email to