Thank you Andrew and Dimitry for your informed responses.  




> On May 2, 2016, at 8:59 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> also, mahout does have optimizer that simply decides on degree of
> parallelism of the _product_. I.e., if it computes C=A'B then it figures
> that final results should be split N ways. but it doesn't apply the
> partition function -- it just uses the usual hash partitioner to forward
> the keys, i don't think we ever override that.
> 
> On Mon, May 2, 2016 at 9:39 AM, Dmitriy Lyubimov <[email protected]> wrote:
> 
>> by probabilistic algorithms i mostly mean inference involving monte carlo
>> type mechanisms (Gibbs sampling LDA which i think might still be part of
>> our MR collection might be an example, as well as its faster counterpart,
>> variational Bayes inference.
>> 
>> the parallelization strategies are are just standard spark mechanisms (in
>> case of spark), mostly are using their standard hash samplers (which are in
>> math speak are uniform multinomial samplers really).
>> 
>> On Mon, May 2, 2016 at 9:25 AM, Khurrum Nasim <[email protected]>
>> wrote:
>> 
>>> Hey Dimitri -
>>> 
>>> Yes I meant probabilistic algorithms.  If mahout doesn’t use
>>> probabilistic algos then how does it accomplish a degree of optimal
>>> parallelization ? Wouldn’t you need randomization to spread out the
>>> processing of tasks.
>>> 
>>>> On May 2, 2016, at 12:13 PM, Dmitriy Lyubimov <[email protected]>
>>> wrote:
>>>> 
>>>> yes mahout has stochastic svd and pca which are described at length in
>>> the
>>>> samsara book. The book examples in Andrew Palumbo's github also contain
>>> an
>>>> example of computing k-means|| sketch.
>>>> 
>>>> if you mean _probabilistic_ algorithms, although i have done some things
>>>> outside the public domain, nothing has been contributed.
>>>> 
>>>> You are very welcome to try something if you don't have big constraints
>>> on
>>>> oss contribution.
>>>> 
>>>> -d
>>>> 
>>>> On Mon, May 2, 2016 at 7:49 AM, Khurrum Nasim <[email protected]
>>>> 
>>>> wrote:
>>>> 
>>>>> Hey All,
>>>>> 
>>>>> I’d like to know if Mahout uses any randomized algorithms.   I’m
>>> thinking
>>>>> it probably does.  Can somebody point me to the packages that utilized
>>>>> randomized algos.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Khurrum
>>>>> 
>>>>> 
>>> 
>>> 
>> 

Reply via email to