Re: [PR] Fix xs:float text number parse/unparse edge cases [daffodil]

via GitHub Wed, 03 Jan 2024 12:14:44 -0800


stevedlawrence commented on code in PR #1133:
URL: https://github.com/apache/daffodil/pull/1133#discussion_r1440867828



##########
daffodil-runtime1-unparser/src/main/scala/org/apache/daffodil/unparsers/runtime1/ConvertTextStandardNumberUnparser.scala:
##########
@@ -86,7 +89,22 @@ case class ConvertTextNumberUnparser(
       case n: JNumber if zeroRep.isDefined && isZero(n) => zeroRep.get
       case _ =>
         try {
-          df.format(scaledValue)
+          if (
+            (context.optPrimType.get eq NodeInfo.Float) &&
+            JFloat.isFinite(scaledValue.asInstanceOf[Float])
+          ) {
+            // ICU4J has has no format() function that handles a float. 
Instead, ICU4J converts
+            // the float to a double, and formats that value. However, 
formatting a double as a
+            // string has different precision than formatting a float as a 
string, resulting in
+            // extra precision that implies more accuracy than actually exists 
for floats, and
+            // can also lead to failures to exactly round. To solve this, we 
convert the float
+            // to a String, convert that String to a BigDecimal, and then 
format that. This is
+            // more expensive, but ensures we unparse the exact same precision 
as represented by
+            // the float.

Review Comment:
   > Does constructing BigDecimal(string) work if the string uses exponential 
notation as in "3.4E38" ?
   
   Yep, here's the grammar it supports: 
https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#BigDecimal-java.lang.String-
   
   > If so why didn't they provide a constructor that takes a float?
   
   ICU "natively" supports `BigInteger`, `BigDecimal`, `long`, and `double`. 
Any `Number` that isn't one of those is converted to a `double` (via 
`Number.doubleValue`) and formats it using the `double` logic:
   
   
https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/NumberFormat.java#L269-L289
   
   This is fine for int/short/byte since they can all be exactly represented by 
a double. And it means float is supported, by when a float is converted to a 
double, more decimal places are sometimes needed when converted to a string. 
This Java and not ICU, but it shows the same issue more clearly:
   
   ```scala
   scala> Float.MaxValue.toString
   val res0: String = 3.4028235E38
   
   scala> Float.MaxValue.doubleValue.toString
   val res1: String = 3.4028234663852886E38
   ```
   So ICU's logic of converting to a double is maybe questionable--you can't 
just convert a float to a double and expect to get the same string. Note that 
if you then parse that double string as a float you should get the original 
float back, so the thing in the infoset should come out the same, but the data 
doesn't round trip.
   
   I couldn't find a way to convert a float to a double in a way that would 
result in the same string value (without doing what I did), though it does seem 
like a better algorithm should exist.
   
   > We could bypass all of this if the float is an integer...Nah. Not worth 
it...
   
   Also, if a user is expecting floats, it probably is less likely integers to 
appear in the data/infoset, so it might be an optimization that isn't hit that 
often.
   
   Probably the best thing to do is suggest that if you're dealing with text 
numbers containing decimals, then just always use xs:double--its much less 
likely to have precision issues and it avoids this hack and should be more 
performant.



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/dpath/NodeInfo.scala:
##########
@@ -698,6 +706,8 @@ object NodeInfo extends Enum {
       protected override def fromNumberNoCheck(n: Number): DataValueDouble = 
n.doubleValue
       override val min = -JDouble.MAX_VALUE
       override val max = JDouble.MAX_VALUE
+      override val minStr = "-" + JDouble.MAX_VALUE.toString

Review Comment:
   Yeah, the `min`/`max` vals are `Doubles`, and `Double.toString` results in 
more decimal places in the string. When we create the BigDecimal for range 
comparison, we can't have those extra bits of precision or it changes the valid 
range for a `Float`. For example,
   
   ```scala
   scala> new java.math.BigDecimal(Float.MaxValue.doubleValue.toString)
   val res0: java.math.BigDecimal = 3.4028234663852886E+38
   ```
   That value is much smaller than the actual Float max of `3.4028235E38`--we 
must call toString on the `Float`, e.g.
   
   ```scala
   scala> new java.math.BigDecimal(Float.MaxValue.toString)
   val res0: java.math.BigDecimal = 3.4028235E+38
   ```



##########
daffodil-test/src/test/resources/org/apache/daffodil/section05/simple_types/SimpleTypes.tdml:
##########
@@ -3722,10 +3721,10 @@
     </tdml:infoset>
   </tdml:parserTestCase>
 
-  <tdml:parserTestCase name="float_text_delim" root="floatTextDelim"
+  <tdml:parserTestCase name="float_text_max" root="floatTextDelim"
                        model="SimpleTypes-Embedded.dfdl.xsd" 
description="Section 5 Simple type-float - DFDL-5-008R">
 
-    <tdml:document><![CDATA[3.4028235E38]]></tdml:document>
+    
<tdml:document><![CDATA[340,282,350,000,000,000,000,000,000,000,000,000,000]]></tdml:document>

Review Comment:
   Will do. This was needed to roundtrip--it parses correctly since it's lax 
parsing and supports exponents, but since the textNumberPattern is what you 
say, it unparses without exponents.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix xs:float text number parse/unparse edge cases [daffodil]

Reply via email to