On 20 October 2017 at 18:43, Nandish Jayaram <njaya...@pivotal.io> wrote: > Thank you for following up on that JIRA James. Based on some more code > exploration, it looks like we should be able to replace the native > implementation > of array_dot() with Eigen's dot() function. array_dot() currently takes in > `anyarray` > as you pointed out, and cosine_similarity() takes in double precision > arrays. > > - But I was able to run cosine_similarity() on int[], float8[] and double > precision[] > vector pairs without any issues. > - I also checked that the current array_dot() returns a float8, and not > the type of the input arrays, while cosine_similarity() returns a double. > - Internally in MADlib, a few modules (GLM, SVM, SVD, matrix_ops, and > conjugate > gradient) use the array_dot() function, and they too should not be affected > by > this change. > > So it looks like there might not be any backward compatibility breaking > changes if we replace the native array_dot() with Eigen's dot(). >
Testing locally, eigen array_dot is much faster for doubles, but normal array_dot is a bit faster for float4. I don't have enough knowledge of the internals of either postgres or madlib to say exactly why this is. Maybe postgres is casting float4[] to float8[] when calling postgres functions defined as taking doubles, or maybe postgres itself doesn't cast but rather then internals of array_ops.c are written in such a way as to be faster for float4 than the internals of Eigen. But it seems that even if switching out the implementation totally isn't actually a breaking change, it would cause a slight performance degradation for people not using double precision.