Re: mahout collections updates

Sebastian Schelter Tue, 12 Mar 2013 14:21:55 -0700

I looked into DenseVector and it doesn't use any primitive collections,
so ignore my last mail :)


On 12.03.2013 22:16, Sebastian Schelter wrote:
> As a sidenote: I was kinda shocked recently, that switching from
> DenseVector's dot() method to a direct dot product computation gave a 3x
> increase in performance in
> org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.
> 
> It seems like we really have a performance problem for some usecases.
> 
> On 12.03.2013 22:04, Dawid Weiss wrote:
>>> The primary use case for mahout collections is directly *inside* of
>>> our Vector interface.  Which is to say, it's not directly exposed to
>>> most users, and we don't really expose the ability to do guava collections
>>> stuff on them at all: We Do Math. :)  So in particular, we don't expose
>>
>> Fair enough. But you might want to expose some of it at some point and
>> if this happens it
>> may just be ready for you.
>>
>>> Question is whether there's anything to be gained by just swapping
>>> our own collections *out* for something else, like HPPC or fastutil.
>>
>> Depends. Speed optimizations may be one reason -- you'd need to check
>> if the code gains anything by using these libraries compared to Mahout
>> collections. While microbenchmarks may show large differences my bet
>> is that overall results, taking into account
>> computations and, God forbid, I/O, will be within noise range unless
>> you're really using these data structures a *lot* in tight loops. The
>> only practical benefit I see is getting rid of a chunk of code you
>> don't wish to
>> maintain (like you said: missing features, unit tests, etc.). But I
>> don't negate there is some entertainment value in going back to such
>> fundamental data structures and trying to squeeze the last bit of
>> performance out of them. :)
>>
>> Dawid
>>
>

Re: mahout collections updates

Reply via email to