It's almost certainly the overhead of the iterator creation and iterator methods. DenseVector.dot() is not specialized and the simple dot product method here could as well be placed there. Then the call to DenseVector.dot() would be equally unsurprisingly fast.
On Tue, Mar 12, 2013 at 9:56 PM, Jake Mannix <[email protected]> wrote: > But then where does it slow down? It just wraps a double[] > > On Tuesday, March 12, 2013, Sebastian Schelter wrote: > > > I looked into DenseVector and it doesn't use any primitive collections, > > so ignore my last mail :) > > > > On 12.03.2013 22:16, Sebastian Schelter wrote: > > > As a sidenote: I was kinda shocked recently, that switching from > > > DenseVector's dot() method to a direct dot product computation gave a > 3x > > > increase in performance in > > > org.apache.mahout.cf.taste.hadoop.als.RecommenderJob. > > > > > > It seems like we really have a performance problem for some usecases. > > > > > > On 12.03.2013 22:04, Dawid Weiss wrote: > > >>> The primary use case for mahout collections is directly *inside* of > > >>> our Vector interface. Which is to say, it's not directly exposed to > > >>> most users, and we don't really expose the ability to do guava > > collections > > >>> stuff on them at all: We Do Math. :) So in particular, we don't > expose > > >> > > >> Fair enough. But you might want to expose some of it at some point and > > >> if this happens it > > >> may just be ready for you. > > >> > > >>> Question is whether there's anything to be gained by just swapping > > >>> our own collections *out* for something else, like HPPC or fastutil. > > >> > > >> Depends. Speed optimizations may be one reason -- you'd need to check > > >> if the code gains anything by using these libraries compared to Mahout > > >> collections. While microbenchmarks may show large differences my bet > > >> is that overall results, taking into account > > >> computations and, God forbid, I/O, will be within noise range unless > > >> you're really using these data structures a *lot* in tight loops. The > > >> only practical benefit I see is getting rid of a chunk of code you > > >> don't wish to > > >> maintain (like you said: missing features, unit tests, etc.). But I > > >> don't negate there is some entertainment value in going back to such > > >> fundamental data structures and trying to squeeze the last bit of > > >> performance out of them. :) > > >> > > >> Dawid > > >> > > > > > > > > > -- > > -jake >
