Re: Performance of array_dot vs cosine_similarity, continued

Nandish Jayaram Mon, 23 Oct 2017 12:09:49 -0700

Thank you for the further analysis James. This certainly looks like
something
to change, in MADlib-2.0. Since 2.0 is a backward compatibility breaking
release,
it's just easier to have the same function name and just change its
implementation.
But more importantly, we will have ample time to do some serious performance
testing, since array_dot is used internally by other MADlib modules such as
SVM
and SVD to name a few.


Given the performance impact this particular change could have, I vote to
make
the change. Since it would also incur lots of performance testing, I would
vote to
include this in the 2.0 release.

NJ

On Fri, Oct 20, 2017 at 1:59 PM, James Gregory <[email protected]> wrote:

> On 20 October 2017 at 18:43, Nandish Jayaram <[email protected]> wrote:
> > Thank you for following up on that JIRA James. Based on some more code
> > exploration, it looks like we should be able to replace the native
> > implementation
> > of array_dot() with Eigen's dot() function. array_dot() currently takes
> in
> > `anyarray`
> > as you pointed out, and cosine_similarity() takes in double precision
> > arrays.
> >
> > - But I was able to run cosine_similarity() on int[], float8[] and double
> > precision[]
> > vector pairs without any issues.
> > - I also checked that the current array_dot() returns a float8, and not
> > the type of the input arrays, while cosine_similarity() returns a double.
> > - Internally in MADlib, a few modules (GLM, SVM, SVD, matrix_ops, and
> > conjugate
> > gradient) use the array_dot() function, and they too should not be
> affected
> > by
> > this change.
> >
> > So it looks like there might not be any backward compatibility breaking
> > changes if we replace the native array_dot() with Eigen's dot().
> >
>
> Testing locally, eigen array_dot is much faster for doubles, but
> normal array_dot is a bit faster for float4. I don't have enough
> knowledge of the internals of either postgres or madlib to say exactly
> why this is. Maybe postgres is casting float4[] to float8[] when
> calling postgres functions defined as taking doubles, or maybe
> postgres itself doesn't cast but rather then internals of array_ops.c
> are written in such a way as to be faster for float4 than the
> internals of Eigen.
>
> But it seems that even if switching out the implementation totally
> isn't actually a breaking change, it would cause a slight performance
> degradation for people not using double precision.
>

Re: Performance of array_dot vs cosine_similarity, continued

Reply via email to