Ok now that's even weirder than I thought! You're still calling vector methods, you're not doing direct array access or anything...
On Tuesday, March 12, 2013, Sebastian Schelter wrote: > I wrote a small benchmark as we investigated this issue recently for > computing recommendations from an ALS factorization. > > https://gist.github.com/sscdotopen/5147521 > > Here's some results on my notebook, indicating a 5x to 6x performance > improvement: > > dot() 570ms, direct 98ms > dot() 568ms, direct 97ms > dot() 559ms, direct 97ms > dot() 566ms, direct 98ms > dot() 574ms, direct 98ms > dot() 582ms, direct 101ms > dot() 581ms, direct 98ms > dot() 576ms, direct 98ms > dot() 574ms, direct 97ms > dot() 574ms, direct 99ms > dot() 564ms, direct 98ms > dot() 576ms, direct 97ms > dot() 580ms, direct 100ms > > > > On 12.03.2013 23:00, Sean Owen wrote: > > It's almost certainly the overhead of the iterator creation and iterator > > methods. > > DenseVector.dot() is not specialized and the simple dot product method > here > > could as well be placed there. Then the call to DenseVector.dot() would > be > > equally unsurprisingly fast. > > > > > > On Tue, Mar 12, 2013 at 9:56 PM, Jake Mannix > > <[email protected]<javascript:;>> > wrote: > > > >> But then where does it slow down? It just wraps a double[] > >> > >> On Tuesday, March 12, 2013, Sebastian Schelter wrote: > >> > >>> I looked into DenseVector and it doesn't use any primitive collections, > >>> so ignore my last mail :) > >>> > >>> On 12.03.2013 22:16, Sebastian Schelter wrote: > >>>> As a sidenote: I was kinda shocked recently, that switching from > >>>> DenseVector's dot() method to a direct dot product computation gave a > >> 3x > >>>> increase in performance in > >>>> org.apache.mahout.cf.taste.hadoop.als.RecommenderJob. > >>>> > >>>> It seems like we really have a performance problem for some usecases. > >>>> > >>>> On 12.03.2013 22:04, Dawid Weiss wrote: > >>>>>> The primary use case for mahout collections is directly *inside* of > >>>>>> our Vector interface. Which is to say, it's not directly exposed to > >>>>>> most users, and we don't really expose the ability to do guava > >>> collections > >>>>>> stuff on them at all: We Do Math. :) So in particular, we don't > >> expose > >>>>> > >>>>> Fair enough. But you might want to expose some of it at some point > and > >>>>> if this happens it > >>>>> may just be ready for you. > >>>>> > >>>>>> Question is whether there's anything to be gained by just swapping > >>>>>> our own collections *out* for something else, like HPPC or fastutil. > >>>>> > >>>>> Depends. Speed optimizations may be one reason -- you'd need to check > >>>>> if the code gains anything by using these libraries compared to > Mahout > >>>>> collections. While microbenchmarks may show large differences my bet > >>>>> is that overall results, taking into account > >>>>> computations and, God forbid, I/O, will be within noise range unless > >>>>> you're really using these data structures a *lot* in tight loops. The > >>>>> only practical benefit I see is getting rid of a chunk of code you > >>>>> don't wish to > >>>>> maintain (like you said: missing features, unit tests, etc.). But I > >>>>> don't negate there is some entertainment value in going back to such > >>>>> fundamental data structures and trying to squeeze the last bit of > >>>>> performance out of them. :) > >>>>> > >>>>> Dawid > >>>>> > >>>> > >>> > >>> > >> > >> -- > >> > >> -jake > >> > > > > -- -jake
