[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

via GitHub Wed, 08 Mar 2023 10:23:24 -0800


stevedlawrence commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1129861918



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/ConvertTextStandardNumberParser.scala:
##########
@@ -171,47 +185,91 @@ case class ConvertTextStandardNumberParser(
       case Some(_) => primNumeric.fromNumber(0)
       case None => {
         val df = textNumberFormatEv.evaluate(start)
-        val strCheckPolicy = if (df.isParseStrict) str else str.trim
+        val strToParse = if (df.isParseStrict) str else str.trim
         val pos = new ParsePosition(0)
-        val icuNum: Number = df.parse(strCheckPolicy, pos) match {
+        val icuNum: JNumber = df.parse(strToParse, pos) match {
           case null => {
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            val infNaN: JDouble =
+              if (df.isDecimalPatternMatchRequired) {
+                // ICU failed to parse. But there is a bug in ICU4J 
(ICU-22303) that if there is
+                // a decimal in the pattern and we've set that decimal to be 
required (due to
+                // strict mode), then it will fail to parse Inf/NaN 
representations. As a
+                // workaround, we clone the DecimalFormat, disable requiring 
the decimal, and
+                // reparse. We only accept successful Inf/NaN parses 
though--everything else is
+                // considered a parse error since it meant the decimal point 
was missing or
+                // wasn't either inf/nan or a valid number. If ICU fixes this 
bug, we should
+                // remove this infNan variable and its use, as it is likely 
pretty expensive to
+                // clone, change a setting, and reparse. Fortunately, it is 
only in the error
+                // case of strict parsing so should be rare.
+                pos.setIndex(0)
+                val newDF = df.clone().asInstanceOf[DecimalFormat]
+                newDF.setDecimalPatternMatchRequired(false)
+                newDF.parse(strToParse, pos) match {
+                  case d: JDouble => {
+                    Assert.invariant(d.isNaN || d.isInfinite)
+                    d
+                  }
+                  case _ => null
+                }
+              } else {
+                null
+              }
+
+            if (infNaN != null) {
+              infNaN
+            } else {
+              PE(
+                start,
+                "Unable to parse %s from text: %s",
+                context.optPrimType.get.globalQName,
+                str,
+              )
+              return
+            }
           }
-          case d: JDouble if primNumeric.isInteger => {
-            // If ICU returns a Double when only integers are expected, it 
means the
-            // string must have been NaN, -Infinity, or Infinity using the 
locales
-            // default symbols. There does not seem to be a way to disable 
this even
-            // with setParseBigDecimal to false and setParseIntegerOnly set to 
true.
-            // So just create the same PE as if it failed to parse it, which 
is what
-            // we really want ICU to do
+          case d: JDouble => {
+            // ICU returns a Double only if it parsed NaN, Infinity, or 
-Infinity. We will later
+            // pass this value in primNumber.fromNumber, which will fail if 
the primitive type
+            // does not allow NaN/Infinity
             Assert.invariant(d.isNaN || d.isInfinite)
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            d
           }
           case bd: ICUBigDecimal => {
-            // sometimes ICU will return their own custom BigDecimal, even if 
the
-            // value could be represented as a BigInteger. We only want Java 
types,
-            // so detect this and convert it to the appropriate type
+            // ICU will return their own custom BigDecimal if the value cannot 
fit in a Long and
+            // isn't infinity/NaN. We only want Java types, so detect this and 
convert it to the

Review Comment:
   > Asking only for curiosity. Why does ICU return a custom BigDecimal instead 
of a BigDecimal? 
   
   Why does ICU do anything? It's a mystery wrapped inside an enigma. We just 
do our best to make it work :)
   
   > Why does the difference matter to Daffodil, that is, why isn't the custom 
BigDecimal compatible enough?
   
   I'm not really sure what's different and it's not obvious reading the ICU 
BigDecimal documentation. But we try to use Java types everywhere (avoiding 
both Scala types and custom ICU types) just so we're consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Reply via email to