stevedlawrence commented on code in PR #1138:
URL: https://github.com/apache/daffodil/pull/1138#discussion_r1447631188
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/EvTextNumber.scala:
##########
@@ -186,6 +186,31 @@ class TextNumberFormatEv(
}
}
+ // If textNumberPattern specifies a pad character before the number
pattern and without a
+ // positive prefix, then ICU defaults to a pad position of
PAD_BEFORE_PREFIX with no way to
+ // change it with just the pattern. This is reasonable for most cases,
like when the pad
+ // character is a space. However, if the pad character in
textNumberPattern is '0', then
+ // negative numbers are padded with a '0' before the negative sign. For
example, a pattern
+ // of "*0####0" unparses -123 to "0-123". This is very unlikely to be what
the user wants
+ // with this pattern.
+ //
+ // So in this very specific case, we change the pad position to
PAD_AFTER_PREFIX so the zero
+ // pad character appears after the negative sign, e.g. "-0123". Note that
the check for
+ // format width > 0 is used to test if padding is enabled at all--there is
no specific API
+ // for this, and this is how ICU determines to add padding or not.
+ //
+ // If a user really wants '0' characters to the left of the negative sign,
they can use
+ // textPadKind/textTrimKind and textNumberPadCharacter to uses Daffodils
padding logic
+ // instead of ICUs.
+ if (
Review Comment:
Could this run into issues related to parsing, since ICU only uses the
positive pattern for parsing?
For example, with the pattern `*000000;-*00000` (note that we would need the
pad char defined in both pos and neg patterns, since ICU only gets pad from the
positive pattern) ICU will require 5 digits even for negative numbers when
parsing since the length comes from the positive pattern. Only negative prefix
and suffix come from the negative pattern. So this approach would unparse
negative numbers with 4 digits but parse would expect 5 digits.
I guess this could be resolved by using this as your pattern:
`*0####0;-*0####0`? So on parsing, it requires 5 characters total, including
the minus sign for negative numbers. And unparsing negative numbers would have
the padding on he correct side of the minus sign since there's a positive
prefix for a negated negative number. Though, parsing this would still expect
padding on the left side of the negative sign.
This also could get confusing if someone uses very different positive and
negative patterns. Parse would only accept the positive pattern (plus
prefix/suffix from negative pattern), but unparse would unparse with the
negative pattern. Feels like that could be confusing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]