Re: [PR] [RFC] Add special case for textNumberPattern with a '0' pad character [daffodil]

via GitHub Wed, 10 Jan 2024 08:31:46 -0800


stevedlawrence commented on code in PR #1138:
URL: https://github.com/apache/daffodil/pull/1138#discussion_r1447631188



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/EvTextNumber.scala:
##########
@@ -186,6 +186,31 @@ class TextNumberFormatEv(
       }
     }
 
+    // If textNumberPattern specifies a pad character before the number 
pattern and without a
+    // positive prefix, then ICU defaults to a pad position of 
PAD_BEFORE_PREFIX with no way to
+    // change it with just the pattern. This is reasonable for most cases, 
like when the pad
+    // character is a space. However, if the pad character in 
textNumberPattern is '0', then
+    // negative numbers are padded with a '0' before the negative sign. For 
example, a pattern
+    // of "*0####0" unparses -123 to "0-123". This is very unlikely to be what 
the user wants
+    // with this pattern.
+    //
+    // So in this very specific case, we change the pad position to 
PAD_AFTER_PREFIX so the zero
+    // pad character appears after the negative sign, e.g. "-0123". Note that 
the check for
+    // format width > 0 is used to test if padding is enabled at all--there is 
no specific API
+    // for this, and this is how ICU determines to add padding or not.
+    //
+    // If a user really wants '0' characters to the left of the negative sign, 
they can use
+    // textPadKind/textTrimKind and textNumberPadCharacter to uses Daffodils 
padding logic
+    // instead of ICUs.
+    if (

Review Comment:
   Could this run into issues related to parsing, since ICU only uses the 
positive pattern for parsing?
   
   For example, with the pattern `*000000;-*00000` (note that we would need the 
pad char defined in both pos and neg patterns, since ICU only gets pad from the 
positive pattern) ICU will require 5 digits even for negative numbers when 
parsing since the length comes from the positive pattern. Only negative prefix 
and suffix come from the negative pattern. So this approach would unparse 
negative numbers with 4 digits but parse would expect 5 digits.
   
   I guess this could be resolved by using this as your pattern: 
`*0####0;-*0####0`? So on parsing, it requires 5 characters total, including 
the minus sign for negative numbers. And unparsing negative numbers would have 
the padding on he correct side of the minus sign since there's a positive 
prefix for a negated negative number. Though, parsing this would still expect 
padding on the left side of the negative sign.
   
   This also could get confusing if someone uses very different positive and 
negative patterns. Parse would only accept the positive pattern (plus 
prefix/suffix from negative pattern), but unparse would unparse with the 
negative pattern. Feels like that could be confusing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [RFC] Add special case for textNumberPattern with a '0' pad character [daffodil]

Reply via email to