[ 
https://issues.apache.org/jira/browse/DAFFODIL-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Beckerle updated DAFFODIL-2870:
------------------------------------
    Description: 
The dfdl:textNumberPattern property is an ICU number format pattern.

It has a positive part and optional negative part separated by a ";"

ICU documents that the negative part is used only to define the negative sign 
indication. So for example "#;-#" means a hyphen is used as a minus sign. The 
pattern "#;(#)" means negative values are surrounded by parentheses. Other 
pattern characters can be present, but are ignored except for indicating where 
these sign indicator characters are relative to the digits of the number.

So for example "00000;- #" means -12 would be formatted as - 00012 because the 
number of digits is taken from the positive pattern. Only the `"- "` (hypen and 
space) is taken from the negative pattern. This pattern means the same:

"0000;- ######00000000"

Since the only significance to the negative pattern "######00000000" string is 
to indicate that the hyphen and space appears before the digits. In fact any 
number specifier like "##,###,##0.00###" in a negative pattern is ignored and 
really should just be written as a single "0" or "#" character.  Other things 
like the ICU pad character specifier, if they appear in the negative pattern, 
are ignored as well regardless of the fact that they could be useful. 

The fact that these are allowed, yet ignored, is unintuitive, misleading, and 
error prone, because users are not going to realize almost everything about the 
negative pattern gets ignored.

Daffodil should warn if the negative pattern contains anything other than 
prefix, a single "#" or "0" character, and suffix specified. 

The warning message should say the negative pattern is only used to specify the 
prefix and suffix used to indicate negative values. Everything else is ignored. 
Ideally we should parse the negative pattern syntax and point out all the 
ignored parts. 

This warning should be suppressable via the usual WarnID mechanism. 

We should consider having a tunable or property which if set escalates this 
warning to a SchemaDefinitionError. 

Honestly I think the only meaningful negative patterns are probably:
 * "-#"
 * "(#)"
 * "#-"

With minor variations which insert spaces such as:
 * "- #" (a space was added after the sign)
 * "( # )" (spaces between digits and parens
 * "# -" (a space before the trailing sign)

Here's an example of a complex dfdl:textNumberPattern which makes the point 
that the negative pattern is just a kind of trivial tail end. 

```

dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"

```

The only contribution of the negative part of that pattern is that "- " (hyphen 
and space) is used as the prefix for negative values. The rest all comes from 
the positive pattern. 

The value negative 1234.5 would unparse as `"- xxxx1,234.50"` 

If instead the user writes:

```

dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"

```

The warning should be issued and state that the negative part of this pattern 
is mostly ignored. Only the `"- "` is significant, and the negative part could 
be just `"- #"`, so the whole pattern shortened to `"+ *x#, ###,##0.00;- #"`.

This requires a simple parse of the negative pattern to identify the 
significant parts, but this is quite easy. (Lookup Scala Regex Pattern 
Matching).

  was:
The dfdl:textNumberPattern property is an ICU number format pattern.

It has a positive part and optional negative part separated by a ";"

ICU documents that the negative part is used only to define the negative sign 
indication. So for example "#;-#" means a hyphen is used as a minus sign. The 
pattern "#;(#)" means negative values are surrounded by parentheses. Other 
pattern characters can be present, but are ignored except for indicating where 
these sign indicator characters are relative to the digits of the number.

So for example "00000;- #" means -12 would be formatted as - 00012 because the 
number of digits is taken from the positive pattern. Only the `"- "` (hypen and 
space) is taken from the negative pattern. This pattern means the same:

"0000;- ######00000000"

Since the only significance to the negative pattern "######00000000" string is 
to indicate that the hyphen and space appears before the digits. In fact any 
number specifier like "##,###,##0.00###" in a negative pattern is ignored and 
really should just be written as a single "0" or "#" character.  Other things 
like the ICU pad character specifier, if they appear in the negative pattern, 
are ignored as well regardless of the fact that they could be useful. 

The fact that these are allowed, yet ignored, is unintuitive, misleading, and 
error prone, because users are not going to realize almost everything about the 
negative pattern gets ignored.

Daffodil should warn if the negative pattern contains anything other than 
prefix, a single "#" or "0" character, and suffix specified. 

The warning message should say the negative pattern is only used to specify the 
prefix and suffix used to indicate negative values. Everything else is ignored. 
Ideally we should parse the negative pattern syntax and point out all the 
ignored parts. 

This warning should be suppressable via the usual WarnID mechanism. 

We should consider having a tunable or property which if set escalates this 
warning to a SchemaDefinitionError. 

Honestly I think the only meaningful negative patterns are probably:
 * "-#"
 * "(#)"
 * "#-"

With minor variations which insert spaces such as:
 * "- #" (a space was added after the sign)
 * "( # )" (spaces between digits and parens
 * "# -" (a space before the trailing sign)

Here's an example of a complex dfdl:textNumberPattern which makes the point 
that the negative pattern is just a kind of trivial tail end. 

```

dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"

```

The only contribution of the negative part of that pattern is that "- " (hyphen 
and space) is used as the prefix for negative values. The rest all comes from 
the positive pattern. 

The value -1234.5 would unparse as `"- xxxx1,234.50"` 

If instead the user writes:

```

dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"

```

The warning should be issued and state that the negative part of this pattern 
is mostly ignored. Only the `"- "` is significant, and the negative part could 
be just `"- #"`, so the whole pattern shortened to `"+ *x#, ###,##0.00;- #"`.

This requires a simple parse of the negative pattern to identify the 
significant parts, but this is quite easy. (Lookup Scala Regex Pattern 
Matching).


> textNumberPattern negative part should get warning if it specifies ignored 
> pattern characters
> ---------------------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2870
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2870
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Front End
>    Affects Versions: 3.6.0
>            Reporter: Mike Beckerle
>            Priority: Major
>              Labels: beginner
>
> The dfdl:textNumberPattern property is an ICU number format pattern.
> It has a positive part and optional negative part separated by a ";"
> ICU documents that the negative part is used only to define the negative sign 
> indication. So for example "#;-#" means a hyphen is used as a minus sign. The 
> pattern "#;(#)" means negative values are surrounded by parentheses. Other 
> pattern characters can be present, but are ignored except for indicating 
> where these sign indicator characters are relative to the digits of the 
> number.
> So for example "00000;- #" means -12 would be formatted as - 00012 because 
> the number of digits is taken from the positive pattern. Only the `"- "` 
> (hypen and space) is taken from the negative pattern. This pattern means the 
> same:
> "0000;- ######00000000"
> Since the only significance to the negative pattern "######00000000" string 
> is to indicate that the hyphen and space appears before the digits. In fact 
> any number specifier like "##,###,##0.00###" in a negative pattern is ignored 
> and really should just be written as a single "0" or "#" character.  Other 
> things like the ICU pad character specifier, if they appear in the negative 
> pattern, are ignored as well regardless of the fact that they could be 
> useful. 
> The fact that these are allowed, yet ignored, is unintuitive, misleading, and 
> error prone, because users are not going to realize almost everything about 
> the negative pattern gets ignored.
> Daffodil should warn if the negative pattern contains anything other than 
> prefix, a single "#" or "0" character, and suffix specified. 
> The warning message should say the negative pattern is only used to specify 
> the prefix and suffix used to indicate negative values. Everything else is 
> ignored. Ideally we should parse the negative pattern syntax and point out 
> all the ignored parts. 
> This warning should be suppressable via the usual WarnID mechanism. 
> We should consider having a tunable or property which if set escalates this 
> warning to a SchemaDefinitionError. 
> Honestly I think the only meaningful negative patterns are probably:
>  * "-#"
>  * "(#)"
>  * "#-"
> With minor variations which insert spaces such as:
>  * "- #" (a space was added after the sign)
>  * "( # )" (spaces between digits and parens
>  * "# -" (a space before the trailing sign)
> Here's an example of a complex dfdl:textNumberPattern which makes the point 
> that the negative pattern is just a kind of trivial tail end. 
> ```
> dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"
> ```
> The only contribution of the negative part of that pattern is that "- " 
> (hyphen and space) is used as the prefix for negative values. The rest all 
> comes from the positive pattern. 
> The value negative 1234.5 would unparse as `"- xxxx1,234.50"` 
> If instead the user writes:
> ```
> dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"
> ```
> The warning should be issued and state that the negative part of this pattern 
> is mostly ignored. Only the `"- "` is significant, and the negative part 
> could be just `"- #"`, so the whole pattern shortened to `"+ *x#, 
> ###,##0.00;- #"`.
> This requires a simple parse of the negative pattern to identify the 
> significant parts, but this is quite easy. (Lookup Scala Regex Pattern 
> Matching).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to