Hi Kevin,

Great question, and thanks for posting it to the mailing list!


When comparing the floating point values i would suggest using our "distance in 
bits" for matrices containing double values. This gives you the ability to 
specify a relative difference between the values, rather than the typical 
double comparison with an epsilon specified to an exact value.


You can find the method to compare in:

File:    src/test/java/org/apache/sysds/test/TestUtils.java

Method:    compareMatricesBitAvgDistance.


Note the bit distance is a long, that specify how many of the tailing bits of 
the double values distance is allowed. The long can be in the entire long 
positive value space with Long.MAX_VALUE, meaning totally different values 
expected, to 0, meaning exactly the same encoded double value. I would suggest 
trying out using 2^14 to start with.


It is normal that values can be off by 2.0E80 if the values we are talking 
about is in those orders of magnitude, so therefore it is okay for those tests 
to use an epsilon like that. Furthermore in systemds we use Kahan correction of 
our double values, that make them able to correct for rounding errors more 
detailed than the 64 bit double values. This rounding can make the values 
deviate after a number of operations such that the difference becomes more 
exaggerated.


Best regards

Sebastian Baunsgaard



________________________________
From: Kevin Pretterhofer <[email protected]>
Sent: Wednesday, January 13, 2021 12:52:47 PM
To: [email protected]
Subject: [Question] Regarding test cases and floating point errors

Hi all,

I hope this is the right place to ask questions. If not I am sorry, but
it would be nice to direct me to the right place then.

So my question is about the unit tests. Currently I am implementing a
simple gaussian classifier. Besides the class prior probabilities,
this implementation also outputs the respective mean values,
determinants, and covariance matrices, respectively their inverses.

Now I face the problem, that the values of my SystemDS implementation
and my R implementation are quite off for random generated test
matrices. I assume that this is due to floating point errors / floating
point precision. At first glance they look quite similar, but since it
outputs scientific notation,
one can clearly see that the magnitude by which they are off is quite a
lot. E.g. for my determinant comparison I got the following:

(1,1): 1.2390121975770675E14 <--> 1.2390101941279517E14
(3,1): 1.510440018532407E85 <--> 1.5104388050968705E85
(2,1): 1.6420264128994816E38 <--> 1.6420263615987703E38
(5,1): 8.881025000211518E70 <--> 8.881037540234089E70
(4,1): 1.7888589555748764E22 <--> 1.78885700537877E22

I face similar issues with the inverses of my covariance matrices.

Since I use the eigenvalues and eigenvectors for calculating the
determinant and the inverse in SystemDL, I already compared them to the
eigenvalues and vectors which R
computes, and already there, differences (due to floating point
differences) are observable.

My question would be now, how to test, respectively compare such
matrices and vectors?
It seems a bit odd to me, to set the tolerance to something like
"2.0E80" or so.

Would be great if someone could help me out!

Best,
Kevin

Reply via email to