As a sidenote: I was kinda shocked recently, that switching from DenseVector's dot() method to a direct dot product computation gave a 3x increase in performance in org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.
It seems like we really have a performance problem for some usecases. On 12.03.2013 22:04, Dawid Weiss wrote: >> The primary use case for mahout collections is directly *inside* of >> our Vector interface. Which is to say, it's not directly exposed to >> most users, and we don't really expose the ability to do guava collections >> stuff on them at all: We Do Math. :) So in particular, we don't expose > > Fair enough. But you might want to expose some of it at some point and > if this happens it > may just be ready for you. > >> Question is whether there's anything to be gained by just swapping >> our own collections *out* for something else, like HPPC or fastutil. > > Depends. Speed optimizations may be one reason -- you'd need to check > if the code gains anything by using these libraries compared to Mahout > collections. While microbenchmarks may show large differences my bet > is that overall results, taking into account > computations and, God forbid, I/O, will be within noise range unless > you're really using these data structures a *lot* in tight loops. The > only practical benefit I see is getting rid of a chunk of code you > don't wish to > maintain (like you said: missing features, unit tests, etc.). But I > don't negate there is some entertainment value in going back to such > fundamental data structures and trying to squeeze the last bit of > performance out of them. :) > > Dawid >
