Mike Beckerle created DAFFODIL-2870:
---------------------------------------

             Summary: textNumberPattern negative part should get warning if it 
specifies ignored pattern characters
                 Key: DAFFODIL-2870
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2870
             Project: Daffodil
          Issue Type: Improvement
          Components: Front End
    Affects Versions: 3.6.0
            Reporter: Mike Beckerle


The dfdl:textNumberPattern property is an ICU number format pattern.

It has a positive part and optional negative part separated by a ";"

ICU documents that the negative part is used only to define the negative sign 
indication. So for example "#;-#" means a hyphen is used as a minus sign. The 
pattern "#;(#)" means negative values are surrounded by parentheses. Other 
pattern characters can be present, but are ignored except for indicating where 
these sign indicator characters are relative to the digits of the number.

So for example "00000;- #" means -12 would be formatted as -00012 because the 
number of digits is taken from the positive pattern. This pattern means the 
same:

"0000;-######00000000"

Since the only significance to the negative pattern "######00000000" string is 
to indicate that the hyphen/minus appears before the digits. In fact any number 
specifier like "##,###,##0.00###" in a negative pattern is ignored and really 
should just be written as a single "0" or "#" character.  Other things like the 
ICU pad character specifier, if they appear in the negative pattern, are 
ignored as well regardless of the fact that they could be useful. 

The fact that these are allowed, yet ignored, is unintuitive, misleading, and 
error prone, because users are not going to realize almost everything about the 
negative pattern gets ignored.

Daffodil should warn if the negative pattern contains anything other than 
prefix, a single "#" or "0" character, and suffix specified. 

The warning message should say the negative pattern is only used to specify the 
prefix and suffix used to indicate negative values. Everything else is ignored. 
Ideally we should parse the negative pattern syntax and point out all the 
ignored parts. 

This warning should be suppressable via the usual WarnID mechanism. 

We should consider having a tunable or property which if set escalates this 
warning to a SchemaDefinitionError. 

Honestly I think the only meaningful negative patterns are probably:
 * "-#"
 * "(#)"
 * "#-"

With minor variations which insert spaces such as:
 * "- #" (a space was added after the sign)
 * "( # )" (spaces between digits and parens
 * "# -" (a space before the trailing sign)

Here's an example of a complex dfdl:textNumberPattern which makes the point 
that the negative pattern is just a kind of trivial tail end. 

```

dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"

```

The only contribution of the negative part of that pattern is that "- " (hyphen 
and space) is used as the prefix for negative values. The rest all comes from 
the positive pattern. 

The value -1234.5 would unparse as `"- xxxx1,234.50"` 

If instead the user writes:

```

dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"

```

The warning should be issued and state that the negative part of this pattern 
is mostly ignored. Only the `"- "` is significant, and the negative part could 
be just `"- #"`, so the whole pattern shortened to `"+ *x#, ###,##0.00;- #"`.

This requires a simple parse of the negative pattern to identify the 
significant parts, but this is quite easy. (Lookup Scala Regex Pattern 
Matching).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to