[
https://issues.apache.org/jira/browse/DAFFODIL-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Beckerle updated DAFFODIL-2870:
------------------------------------
Description:
The dfdl:textNumberPattern property is an ICU number format pattern.
It has a positive part and optional negative part separated by a ";"
ICU documents that the negative part is used only to define the negative sign
indication. So for example "#;-#" means a hyphen is used as a minus sign. The
pattern "#;(#)" means negative values are surrounded by parentheses. Other
pattern characters can be present, but are ignored except for indicating where
these sign indicator characters are relative to the digits of the number.
So for example "00000;- #" means -12 would be formatted as - 00012 because the
number of digits is taken from the positive pattern. Only the "- " (hypen and
space) is taken from the negative pattern. This pattern means the same:
"0000;- ######00000000"
Since the only significance to the negative pattern "######00000000" string is
to indicate that the hyphen and space appears before the digits. In fact any
number specifier like "##,###,##0.00###" in a negative pattern is ignored and
really should just be written as a single "0" or "#" character. Other things
like the ICU pad character specifier, if they appear in the negative pattern,
are ignored as well regardless of the fact that they could be useful.
The fact that these are allowed, yet ignored, is unintuitive, misleading, and
error prone, because users are not going to realize almost everything about the
negative pattern gets ignored.
Daffodil should warn if the negative pattern contains anything other than
prefix, a single "#" or "0" character, and suffix specified.
The warning message should say the negative pattern is only used to specify the
prefix and suffix used to indicate negative values. Everything else is ignored.
Ideally we should parse the negative pattern syntax and point out all the
ignored parts.
This warning should be suppressable via the usual WarnID mechanism.
We should consider having a tunable or property which if set escalates this
warning to a SchemaDefinitionError.
Honestly I think the only meaningful negative patterns are probably:
* "-#"
* "(#)"
* "#-"
With minor variations which insert spaces such as:
* "- #" (a space after the sign)
* "( # )" (spaces between digits and parens
* "# -" (a space before the trailing sign)
Here's an example of a complex dfdl:textNumberPattern which makes the point
that the negative pattern is just a kind of trivial tail end.
{code:java}
dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"{code}
The only contribution of the negative part of that pattern is that "- " (hyphen
and space) is used as the prefix for negative values. The rest all comes from
the positive pattern.
The value negative 1234.5 would unparse as "- xxxx1,234.50"
If instead the user writes:
{code:java}
dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"{code}
The warning should be issued and state that the negative part of this pattern
is mostly ignored. Only the "- " is significant, and the negative part could be
just "- #", so the whole pattern shortened to "+ *x#, ###,##0.00;- #".
This requires a simple parse of the negative pattern to identify the
significant parts, but this is quite easy. (Lookup Scala Regex Pattern
Matching).
was:
The dfdl:textNumberPattern property is an ICU number format pattern.
It has a positive part and optional negative part separated by a ";"
ICU documents that the negative part is used only to define the negative sign
indication. So for example "#;-#" means a hyphen is used as a minus sign. The
pattern "#;(#)" means negative values are surrounded by parentheses. Other
pattern characters can be present, but are ignored except for indicating where
these sign indicator characters are relative to the digits of the number.
So for example "00000;- #" means -12 would be formatted as - 00012 because the
number of digits is taken from the positive pattern. Only the `"- "` (hypen and
space) is taken from the negative pattern. This pattern means the same:
"0000;- ######00000000"
Since the only significance to the negative pattern "######00000000" string is
to indicate that the hyphen and space appears before the digits. In fact any
number specifier like "##,###,##0.00###" in a negative pattern is ignored and
really should just be written as a single "0" or "#" character. Other things
like the ICU pad character specifier, if they appear in the negative pattern,
are ignored as well regardless of the fact that they could be useful.
The fact that these are allowed, yet ignored, is unintuitive, misleading, and
error prone, because users are not going to realize almost everything about the
negative pattern gets ignored.
Daffodil should warn if the negative pattern contains anything other than
prefix, a single "#" or "0" character, and suffix specified.
The warning message should say the negative pattern is only used to specify the
prefix and suffix used to indicate negative values. Everything else is ignored.
Ideally we should parse the negative pattern syntax and point out all the
ignored parts.
This warning should be suppressable via the usual WarnID mechanism.
We should consider having a tunable or property which if set escalates this
warning to a SchemaDefinitionError.
Honestly I think the only meaningful negative patterns are probably:
* "-#"
* "(#)"
* "#-"
With minor variations which insert spaces such as:
* "- #" (a space after the sign)
* "( # )" (spaces between digits and parens
* "# -" (a space before the trailing sign)
Here's an example of a complex dfdl:textNumberPattern which makes the point
that the negative pattern is just a kind of trivial tail end.
{code:java}
dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"{code}
The only contribution of the negative part of that pattern is that "- " (hyphen
and space) is used as the prefix for negative values. The rest all comes from
the positive pattern.
The value negative 1234.5 would unparse as "- xxxx1,234.50"
If instead the user writes:
{code:java}
dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"{code}
The warning should be issued and state that the negative part of this pattern
is mostly ignored. Only the "- " is significant, and the negative part could be
just "- #", so the whole pattern shortened to "+ *x#, ###,##0.00;- #".
This requires a simple parse of the negative pattern to identify the
significant parts, but this is quite easy. (Lookup Scala Regex Pattern
Matching).
> textNumberPattern negative part should get warning if it specifies ignored
> pattern characters
> ---------------------------------------------------------------------------------------------
>
> Key: DAFFODIL-2870
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2870
> Project: Daffodil
> Issue Type: Improvement
> Components: Front End
> Affects Versions: 3.6.0
> Reporter: Mike Beckerle
> Priority: Major
> Labels: beginner
>
> The dfdl:textNumberPattern property is an ICU number format pattern.
> It has a positive part and optional negative part separated by a ";"
> ICU documents that the negative part is used only to define the negative sign
> indication. So for example "#;-#" means a hyphen is used as a minus sign. The
> pattern "#;(#)" means negative values are surrounded by parentheses. Other
> pattern characters can be present, but are ignored except for indicating
> where these sign indicator characters are relative to the digits of the
> number.
> So for example "00000;- #" means -12 would be formatted as - 00012 because
> the number of digits is taken from the positive pattern. Only the "- " (hypen
> and space) is taken from the negative pattern. This pattern means the same:
> "0000;- ######00000000"
> Since the only significance to the negative pattern "######00000000" string
> is to indicate that the hyphen and space appears before the digits. In fact
> any number specifier like "##,###,##0.00###" in a negative pattern is ignored
> and really should just be written as a single "0" or "#" character. Other
> things like the ICU pad character specifier, if they appear in the negative
> pattern, are ignored as well regardless of the fact that they could be
> useful.
> The fact that these are allowed, yet ignored, is unintuitive, misleading, and
> error prone, because users are not going to realize almost everything about
> the negative pattern gets ignored.
> Daffodil should warn if the negative pattern contains anything other than
> prefix, a single "#" or "0" character, and suffix specified.
> The warning message should say the negative pattern is only used to specify
> the prefix and suffix used to indicate negative values. Everything else is
> ignored. Ideally we should parse the negative pattern syntax and point out
> all the ignored parts.
> This warning should be suppressable via the usual WarnID mechanism.
> We should consider having a tunable or property which if set escalates this
> warning to a SchemaDefinitionError.
> Honestly I think the only meaningful negative patterns are probably:
> * "-#"
> * "(#)"
> * "#-"
> With minor variations which insert spaces such as:
> * "- #" (a space after the sign)
> * "( # )" (spaces between digits and parens
> * "# -" (a space before the trailing sign)
> Here's an example of a complex dfdl:textNumberPattern which makes the point
> that the negative pattern is just a kind of trivial tail end.
> {code:java}
> dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"{code}
> The only contribution of the negative part of that pattern is that "- "
> (hyphen and space) is used as the prefix for negative values. The rest all
> comes from the positive pattern.
> The value negative 1234.5 would unparse as "- xxxx1,234.50"
> If instead the user writes:
> {code:java}
> dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"{code}
> The warning should be issued and state that the negative part of this pattern
> is mostly ignored. Only the "- " is significant, and the negative part could
> be just "- #", so the whole pattern shortened to "+ *x#, ###,##0.00;- #".
> This requires a simple parse of the negative pattern to identify the
> significant parts, but this is quite easy. (Lookup Scala Regex Pattern
> Matching).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)