But then where does it slow down? It just wraps a double[] On Tuesday, March 12, 2013, Sebastian Schelter wrote:
> I looked into DenseVector and it doesn't use any primitive collections, > so ignore my last mail :) > > On 12.03.2013 22:16, Sebastian Schelter wrote: > > As a sidenote: I was kinda shocked recently, that switching from > > DenseVector's dot() method to a direct dot product computation gave a 3x > > increase in performance in > > org.apache.mahout.cf.taste.hadoop.als.RecommenderJob. > > > > It seems like we really have a performance problem for some usecases. > > > > On 12.03.2013 22:04, Dawid Weiss wrote: > >>> The primary use case for mahout collections is directly *inside* of > >>> our Vector interface. Which is to say, it's not directly exposed to > >>> most users, and we don't really expose the ability to do guava > collections > >>> stuff on them at all: We Do Math. :) So in particular, we don't expose > >> > >> Fair enough. But you might want to expose some of it at some point and > >> if this happens it > >> may just be ready for you. > >> > >>> Question is whether there's anything to be gained by just swapping > >>> our own collections *out* for something else, like HPPC or fastutil. > >> > >> Depends. Speed optimizations may be one reason -- you'd need to check > >> if the code gains anything by using these libraries compared to Mahout > >> collections. While microbenchmarks may show large differences my bet > >> is that overall results, taking into account > >> computations and, God forbid, I/O, will be within noise range unless > >> you're really using these data structures a *lot* in tight loops. The > >> only practical benefit I see is getting rid of a chunk of code you > >> don't wish to > >> maintain (like you said: missing features, unit tests, etc.). But I > >> don't negate there is some entertainment value in going back to such > >> fundamental data structures and trying to squeeze the last bit of > >> performance out of them. :) > >> > >> Dawid > >> > > > > -- -jake
