Re: mahout collections updates

Sebastian Schelter Tue, 12 Mar 2013 14:17:07 -0700

As a sidenote: I was kinda shocked recently, that switching from
DenseVector's dot() method to a direct dot product computation gave a 3x
increase in performance in
org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.


It seems like we really have a performance problem for some usecases.

On 12.03.2013 22:04, Dawid Weiss wrote:
>> The primary use case for mahout collections is directly *inside* of
>> our Vector interface.  Which is to say, it's not directly exposed to
>> most users, and we don't really expose the ability to do guava collections
>> stuff on them at all: We Do Math. :)  So in particular, we don't expose
> 
> Fair enough. But you might want to expose some of it at some point and
> if this happens it
> may just be ready for you.
> 
>> Question is whether there's anything to be gained by just swapping
>> our own collections *out* for something else, like HPPC or fastutil.
> 
> Depends. Speed optimizations may be one reason -- you'd need to check
> if the code gains anything by using these libraries compared to Mahout
> collections. While microbenchmarks may show large differences my bet
> is that overall results, taking into account
> computations and, God forbid, I/O, will be within noise range unless
> you're really using these data structures a *lot* in tight loops. The
> only practical benefit I see is getting rid of a chunk of code you
> don't wish to
> maintain (like you said: missing features, unit tests, etc.). But I
> don't negate there is some entertainment value in going back to such
> fundamental data structures and trying to squeeze the last bit of
> performance out of them. :)
> 
> Dawid
>

Re: mahout collections updates

Reply via email to