Re: Comparing Floating Point numbers

Steve Lawrence Mon, 25 Sep 2023 07:54:58 -0700

When doing floating point math, epsilon comparisons are pretty muchmandatory due to loss of precision.

But the TDML runner isn't doing math, we know *exactly* what valueshould be written to the infoset so we should be able to do exactcomparisons.

Java 19 didn't changed the precision of what it outputs (as far as Iunderstand). It just uses different decimal representations, but theystill map the exact same float that we've always expected. So thischange just means we can't do string comparisons anymore, we must parsethe strings to floats and do floating point comparisons.



On 2023-09-25 10:10 AM, Sood, Harinder wrote:

Floating point comparison is typically done with +- delta as Mike suggested.


Sincerely,
  Harinder Sood


  Senior Program Mnager
   hs...@owlcyberdefense.com
   240 805 4219
   owlcyberdefense.com

The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately.

-----Original Message-----
From: Steve Lawrence <slawre...@apache.org>
Sent: Monday, September 25, 2023 10:01 AM
To: dev@daffodil.apache.org
Subject: Re: Comparing Floating Point numbers

Based on what I've learned today from reading that bug, I think in Java at least, float 
-> string -> float will give you back the exact same float. Note that string -> 
float -> string could give a different result, since multiple strings can map to the 
same float, but a float maps to only a single string.

Java doesn't actually round when converting a float to a string. Instead it finds the shortest 
string representation that can be uniquely mapped back to the original float. The bug I linked 
below is about how Java didn't find the shortest. For example, "1.0E23" and 
"9.999999999999999E22" both map to the exact same float. Older versions of Java output 
the latter, newer version of Java output the former.

But even though it didn't output the shortest, and are different decimal 
representations, both of those strings uniquely map back to the exact same 
float.

So although users must be aware that float values in the infoset might not be 
accurate to the original value, we can be guaranteed that if an infoset string 
is converted to a Java float, it will at least be the exact same float value 
that Daffodil wrote to the infoset.

And so for the purposes of TDML comparisons, it should all be exactly the same.

For these reasons, I don't actually think creating a special raw representation 
actually fixes anything. The decimal string already maps back to the original 
float exactly. The real issues with floats is when you start doing math, or the 
original data was a string. That's when you start losing precision and get 
funky results. I don't there are any actual issues with the infoset conversion 
itself.

On 2023-09-25 08:57 AM, McGann, Mike wrote:

One thing to note is that by putting a value in a TDML document such as 
"12.34e56" it is actually a string. Comparing that to a floating-point value is 
going to require a conversion from string and that could invoke a rounding step if it 
cannot be accurately represented by a float. If you really want to compare two floating 
points exactly, using a binary representation is probably the best such as putting in 
something like 0x1234p56. At least that is how I think I understand it. Floating point 
math is a deep rabbit hole that can be followed. That is probably overkill for TDML.

// Mike

-----Original Message-----
From: Steve Lawrence <slawre...@apache.org>
Sent: Monday, September 25, 2023 08:07
To: dev@daffodil.apache.org
Subject: Re: Comparing Floating Point numbers

+1 for type aware comparisons. It should be a very small change to
+this
function:

https://github.com/apache/daffodil/blob/main/daffodil-lib/src/main/sca
la/org/apache/daffodil/lib/xml/XMLUtils.scala#L1098

And just need to add xsi:type to a few expected infosets that are
sensitive to the issue.

Note that I *think* this might be the bug that caused the change:

https://bugs.openjdk.org/browse/JDK-4511638

Based on that, it sounds like the issue is that Java wasn't creating
the shortest possible decimal representation, but the representation
it did create still parses back to the same floating point
representation. So we *probably* don't even really need epsilon
comparison, we just need type aware comparison, and can still expect
the floating point values to be exactly the same.

Although epsilon comparison is the right way to compare floats, my
concern is that we might add some bug in Daffodil where we do math
wrong and end up with a very very very slightly wrong answer and it
would be hidden. But if our epsilon is small enough, maybe that amount
precision error is fine?

Note that according to that JDK issue, the change was made in Java 19,
so if we add any conditional logic on java version, we should check if
it's at least 19. I guess if we do need epsilon comparisons we could
only do it for java 19 and newer. Older versions would expect exact
values and so would catch any off by very very small amount bugs. That
might be adding unnecessary complication though.


On 2023-09-24 12:09 PM, Mike Beckerle wrote:

So Java 21 produces different floating point values in a few cases.
Some of our tests (4) are sensitive to this.

The "right way" to compare floating point numbers is like this:

If(Math.abs(A - B) < epsilon)

The TDML runner has outstanding bug
https://issues.apache.org/jira/browse/DAFFODIL-2402 which is to add
the ability to put xsi:type="double" for example on the expected
infoset, and this instructructs the (schema unaware) TDML runner to
do comparison using some sort of epsilon comparison like the above

Does that seem like the right fix for this?

The only alternative I can think of is some sort of conditional
infoset construction, so that the expected values can vary for different JVMs.

On Sat, Sep 23, 2023 at 2:13 PM Mike Beckerle <mbecke...@apache.org> wrote:


JVM 21 LTS is now out.

So I decided to try to building Daffodil using it. My WIP PR is
https://github.com/apache/daffodil/pull/1090

It looks pretty close.

The --release 8 option for javac is now deprecated. So I
conditionalized that.
Fixed some deprecated calls.

Remaining issues:

2 more deprecated calls (hence fatal warnings turned off for now)

5 tests fail. One each in these 3 test classes

org.apache.daffodil.TresysTests.test_BG000

org.apache.daffodil.section13.text_number_props.TestTextNumberProps.
test_textNumberPattern_exponent01


org.apache.daffodil.section05.simple_types.TestSimpleTypes.test_doub
le_binary_06

All 3 of those failures are floating point related like this:
(highlighted digit isn't output any more)

[error] Expected (attributes stripped)
[error]    <d_02>9.8765432109876544E16</d_02>
[error] Actual (attributes ignored for diff) [error]
<ex:d_02>9.876543210987654E16</ex:d_02>

The Expected has one more digit 4 at the end.

1 other test failure is for reasons unknown. Possible change in
regex behavior?


org.apache.daffodil.io.layers.TestJavaIOStreams.testBase64ScanningFo
rDelimiter1

One CLI test failure:


org.apache.daffodil.cli.cliTest.TestEXIEncodeDecode.test_CLI_Encode_
Decode_EXI

Re: Comparing Floating Point numbers

Reply via email to