so we're comparing to the following, hopefully?
double result = 0.0;
Iterator<Element> iter = iterateNonZero();
while (iter.hasNext()) {
Element element = iter.next();
result += element.get() * x.getQuick(element.index());
}
return result;
On Tue, Mar 12, 2013 at 3:42 PM, Sebastian Schelter <[email protected]> wrote:
> Using this code in DenseVector gives me similar performance:
>
> @Override
> public double dot(Vector x) {
> if (!x.isDense()) {
> return super.dot(x);
> } else {
> double sum = 0;
> final int size = size();
> for (int n = 0; n < size; n++) {
> sum += values[n] * x.getQuick(n);
> }
> return sum;
> }
> }
>
> On 12.03.2013 23:18, Jake Mannix wrote:
> > Ok now that's even weirder than I thought! You're still calling vector
> > methods, you're not doing direct array access or anything...
> >
> > On Tuesday, March 12, 2013, Sebastian Schelter wrote:
> >
> >> I wrote a small benchmark as we investigated this issue recently for
> >> computing recommendations from an ALS factorization.
> >>
> >> https://gist.github.com/sscdotopen/5147521
> >>
> >> Here's some results on my notebook, indicating a 5x to 6x performance
> >> improvement:
> >>
> >> dot() 570ms, direct 98ms
> >> dot() 568ms, direct 97ms
> >> dot() 559ms, direct 97ms
> >> dot() 566ms, direct 98ms
> >> dot() 574ms, direct 98ms
> >> dot() 582ms, direct 101ms
> >> dot() 581ms, direct 98ms
> >> dot() 576ms, direct 98ms
> >> dot() 574ms, direct 97ms
> >> dot() 574ms, direct 99ms
> >> dot() 564ms, direct 98ms
> >> dot() 576ms, direct 97ms
> >> dot() 580ms, direct 100ms
> >>
> >>
> >>
> >> On 12.03.2013 23:00, Sean Owen wrote:
> >>> It's almost certainly the overhead of the iterator creation and
> iterator
> >>> methods.
> >>> DenseVector.dot() is not specialized and the simple dot product method
> >> here
> >>> could as well be placed there. Then the call to DenseVector.dot() would
> >> be
> >>> equally unsurprisingly fast.
> >>>
> >>>
> >>> On Tue, Mar 12, 2013 at 9:56 PM, Jake Mannix <[email protected]
> <javascript:;>>
> >> wrote:
> >>>
> >>>> But then where does it slow down? It just wraps a double[]
> >>>>
> >>>> On Tuesday, March 12, 2013, Sebastian Schelter wrote:
> >>>>
> >>>>> I looked into DenseVector and it doesn't use any primitive
> collections,
> >>>>> so ignore my last mail :)
> >>>>>
> >>>>> On 12.03.2013 22:16, Sebastian Schelter wrote:
> >>>>>> As a sidenote: I was kinda shocked recently, that switching from
> >>>>>> DenseVector's dot() method to a direct dot product computation gave
> a
> >>>> 3x
> >>>>>> increase in performance in
> >>>>>> org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.
> >>>>>>
> >>>>>> It seems like we really have a performance problem for some
> usecases.
> >>>>>>
> >>>>>> On 12.03.2013 22:04, Dawid Weiss wrote:
> >>>>>>>> The primary use case for mahout collections is directly *inside*
> of
> >>>>>>>> our Vector interface. Which is to say, it's not directly exposed
> to
> >>>>>>>> most users, and we don't really expose the ability to do guava
> >>>>> collections
> >>>>>>>> stuff on them at all: We Do Math. :) So in particular, we don't
> >>>> expose
> >>>>>>>
> >>>>>>> Fair enough. But you might want to expose some of it at some point
> >> and
> >>>>>>> if this happens it
> >>>>>>> may just be ready for you.
> >>>>>>>
> >>>>>>>> Question is whether there's anything to be gained by just swapping
> >>>>>>>> our own collections *out* for something else, like HPPC or
> fastutil.
> >>>>>>>
> >>>>>>> Depends. Speed optimizations may be one reason -- you'd need to
> check
> >>>>>>> if the code gains anything by using these libraries compared to
> >> Mahout
> >>>>>>> collections. While microbenchmarks may show large differences my
> bet
> >>>>>>> is that overall results, taking into account
> >>>>>>> computations and, God forbid, I/O, will be within noise range
> unless
> >>>>>>> you're really using these data structures a *lot* in tight loops.
> The
> >>>>>>> only practical benefit I see is getting rid of a chunk of code you
> >>>>>>> don't wish to
> >>>>>>> maintain (like you said: missing features, unit tests, etc.). But I
> >>>>>>> don't negate there is some entertainment value in going back to
> such
> >>>>>>> fundamental data structures and trying to squeeze the last bit of
> >>>>>>> performance out of them. :)
> >>>>>>>
> >>>>>>> Dawid
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>>
> >>>> -jake
> >>>>
> >>>
> >>
> >>
> >
>
>
--
-jake