Re: Performance of array_dot vs cosine_similarity, continued

James Gregory Fri, 20 Oct 2017 13:59:52 -0700

On 20 October 2017 at 18:43, Nandish Jayaram <njaya...@pivotal.io> wrote:
> Thank you for following up on that JIRA James. Based on some more code
> exploration, it looks like we should be able to replace the native
> implementation
> of array_dot() with Eigen's dot() function. array_dot() currently takes in
> `anyarray`
> as you pointed out, and cosine_similarity() takes in double precision
> arrays.
>
> - But I was able to run cosine_similarity() on int[], float8[] and double
> precision[]
> vector pairs without any issues.
> - I also checked that the current array_dot() returns a float8, and not
> the type of the input arrays, while cosine_similarity() returns a double.
> - Internally in MADlib, a few modules (GLM, SVM, SVD, matrix_ops, and
> conjugate
> gradient) use the array_dot() function, and they too should not be affected
> by
> this change.
>
> So it looks like there might not be any backward compatibility breaking
> changes if we replace the native array_dot() with Eigen's dot().
>


Testing locally, eigen array_dot is much faster for doubles, but
normal array_dot is a bit faster for float4. I don't have enough
knowledge of the internals of either postgres or madlib to say exactly
why this is. Maybe postgres is casting float4[] to float8[] when
calling postgres functions defined as taking doubles, or maybe
postgres itself doesn't cast but rather then internals of array_ops.c
are written in such a way as to be faster for float4 than the
internals of Eigen.

But it seems that even if switching out the implementation totally
isn't actually a breaking change, it would cause a slight performance
degradation for people not using double precision.

Re: Performance of array_dot vs cosine_similarity, continued

Reply via email to