Mike Beckerle created DAFFODIL-2870:
---------------------------------------
Summary: textNumberPattern negative part should get warning if it
specifies ignored pattern characters
Key: DAFFODIL-2870
URL: https://issues.apache.org/jira/browse/DAFFODIL-2870
Project: Daffodil
Issue Type: Improvement
Components: Front End
Affects Versions: 3.6.0
Reporter: Mike Beckerle
The dfdl:textNumberPattern property is an ICU number format pattern.
It has a positive part and optional negative part separated by a ";"
ICU documents that the negative part is used only to define the negative sign
indication. So for example "#;-#" means a hyphen is used as a minus sign. The
pattern "#;(#)" means negative values are surrounded by parentheses. Other
pattern characters can be present, but are ignored except for indicating where
these sign indicator characters are relative to the digits of the number.
So for example "00000;- #" means -12 would be formatted as -00012 because the
number of digits is taken from the positive pattern. This pattern means the
same:
"0000;-######00000000"
Since the only significance to the negative pattern "######00000000" string is
to indicate that the hyphen/minus appears before the digits. In fact any number
specifier like "##,###,##0.00###" in a negative pattern is ignored and really
should just be written as a single "0" or "#" character. Other things like the
ICU pad character specifier, if they appear in the negative pattern, are
ignored as well regardless of the fact that they could be useful.
The fact that these are allowed, yet ignored, is unintuitive, misleading, and
error prone, because users are not going to realize almost everything about the
negative pattern gets ignored.
Daffodil should warn if the negative pattern contains anything other than
prefix, a single "#" or "0" character, and suffix specified.
The warning message should say the negative pattern is only used to specify the
prefix and suffix used to indicate negative values. Everything else is ignored.
Ideally we should parse the negative pattern syntax and point out all the
ignored parts.
This warning should be suppressable via the usual WarnID mechanism.
We should consider having a tunable or property which if set escalates this
warning to a SchemaDefinitionError.
Honestly I think the only meaningful negative patterns are probably:
* "-#"
* "(#)"
* "#-"
With minor variations which insert spaces such as:
* "- #" (a space was added after the sign)
* "( # )" (spaces between digits and parens
* "# -" (a space before the trailing sign)
Here's an example of a complex dfdl:textNumberPattern which makes the point
that the negative pattern is just a kind of trivial tail end.
```
dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"
```
The only contribution of the negative part of that pattern is that "- " (hyphen
and space) is used as the prefix for negative values. The rest all comes from
the positive pattern.
The value -1234.5 would unparse as `"- xxxx1,234.50"`
If instead the user writes:
```
dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"
```
The warning should be issued and state that the negative part of this pattern
is mostly ignored. Only the `"- "` is significant, and the negative part could
be just `"- #"`, so the whole pattern shortened to `"+ *x#, ###,##0.00;- #"`.
This requires a simple parse of the negative pattern to identify the
significant parts, but this is quite easy. (Lookup Scala Regex Pattern
Matching).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)