Hi Kevin,
Great question, and thanks for posting it to the mailing list! When comparing the floating point values i would suggest using our "distance in bits" for matrices containing double values. This gives you the ability to specify a relative difference between the values, rather than the typical double comparison with an epsilon specified to an exact value. You can find the method to compare in: File: src/test/java/org/apache/sysds/test/TestUtils.java Method: compareMatricesBitAvgDistance. Note the bit distance is a long, that specify how many of the tailing bits of the double values distance is allowed. The long can be in the entire long positive value space with Long.MAX_VALUE, meaning totally different values expected, to 0, meaning exactly the same encoded double value. I would suggest trying out using 2^14 to start with. It is normal that values can be off by 2.0E80 if the values we are talking about is in those orders of magnitude, so therefore it is okay for those tests to use an epsilon like that. Furthermore in systemds we use Kahan correction of our double values, that make them able to correct for rounding errors more detailed than the 64 bit double values. This rounding can make the values deviate after a number of operations such that the difference becomes more exaggerated. Best regards Sebastian Baunsgaard ________________________________ From: Kevin Pretterhofer <[email protected]> Sent: Wednesday, January 13, 2021 12:52:47 PM To: [email protected] Subject: [Question] Regarding test cases and floating point errors Hi all, I hope this is the right place to ask questions. If not I am sorry, but it would be nice to direct me to the right place then. So my question is about the unit tests. Currently I am implementing a simple gaussian classifier. Besides the class prior probabilities, this implementation also outputs the respective mean values, determinants, and covariance matrices, respectively their inverses. Now I face the problem, that the values of my SystemDS implementation and my R implementation are quite off for random generated test matrices. I assume that this is due to floating point errors / floating point precision. At first glance they look quite similar, but since it outputs scientific notation, one can clearly see that the magnitude by which they are off is quite a lot. E.g. for my determinant comparison I got the following: (1,1): 1.2390121975770675E14 <--> 1.2390101941279517E14 (3,1): 1.510440018532407E85 <--> 1.5104388050968705E85 (2,1): 1.6420264128994816E38 <--> 1.6420263615987703E38 (5,1): 8.881025000211518E70 <--> 8.881037540234089E70 (4,1): 1.7888589555748764E22 <--> 1.78885700537877E22 I face similar issues with the inverses of my covariance matrices. Since I use the eigenvalues and eigenvectors for calculating the determinant and the inverse in SystemDL, I already compared them to the eigenvalues and vectors which R computes, and already there, differences (due to floating point differences) are observable. My question would be now, how to test, respectively compare such matrices and vectors? It seems a bit odd to me, to set the tolerance to something like "2.0E80" or so. Would be great if someone could help me out! Best, Kevin
