[
https://issues.apache.org/jira/browse/DAFFODIL-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954686#comment-16954686
]
Mike Beckerle commented on DAFFODIL-2218:
-----------------------------------------
The fix will need to be some switch/option that we can use to get back the
older behavior. They obviously changed this for some reason and in theory have
users depending on the new behavior now.
> ICU behavior incompatible - textNumberCheckPolicy lax is lax about "+" signs.
> Was not before.
> ----------------------------------------------------------------------------------------------
>
> Key: DAFFODIL-2218
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2218
> Project: Daffodil
> Issue Type: Bug
> Components: Back End, ICU, Libraries
> Reporter: Mike Beckerle
> Priority: Minor
> Fix For: 2.5.0
>
>
> ICU libraries changed behavior and now strict behavior is being lax about +
> signs.
> Daffodil should revert back to the latest ICU version that doesn't have this
> problem.
> Likely we have to determine what ICU version this changed in, and back out to
> a prior one, as this new behavior is not implementing the DFDL spec behavior.
> See also https://issues.apache.org/jira/browse/DAFFODIL-845
> This from a DFDL Workgroup email thread on this subject:
> {code:java}
> Re: [DFDL-WG] Action 313: Plus '+' sign and lax
> textNumberCheckPolicyInboxxSteve Hanson <[email protected]> Fri, Aug 30, 10:56
> AMto me, slawrence, DFDL-WG, Liam ICU changing behaviour in an incompatible
> way is not good.
> IBM DFDL is way behind, and is still
> on ICU 51.2. We are limited in what we can do as we try to keep the
> same level as IBM Integration Bus & WTX as we have had C namespacing
> issues in the past.
> Looking at the links, there are other
> changes that have crept in when lenient.
> - The string must
> contain a complete prefix and suffix.
> For example, if the pattern is "{#};(#)", then
> "{123}" or "(123)" would match, but "{123",
> "123}", and "123" would all fail.
> (The latter strings would be accepted in lenient mode.)
> -
> Minus and plus signs can only appear if specified in the pattern.
> In lenient mode, a plus or minus sign can always precede
> a number.
> In typical ICU fashion, even this is
> not complete. It says nothing about what happens if the pattern has a sign
> and the data doesn't.
> I suggest you test all the combos with
> Daffodil and establish the truth.
> Then we need to decide what to do. If
> there is no way of controlling this (eg, parameter or env var) then the
> safest option is to backoff Daffodil to the latest ICU release that matches
> the DFDL 1.0 spec, and change the spec so that the link to ICU is specific
> rather than the generic link which is in the spec today
> (http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details)
> and which floats to the latest release. We can't have a moving target.
> Regards
>
> Steve Hanson
> IBM Hybrid Integration, Hursley, UK
> Architect, IBM
> DFDL
> Co-Chair, OGF
> DFDL Working Group
> [email protected]
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
> From:
> Mike Beckerle <[email protected]>
> To:
> DFDL-WG <[email protected]>
> Date:
> 29/08/2019 19:49
> Subject:
> [DFDL-WG] Action
> 313: Plus '+' sign and lax textNumberCheckPolicy
> Sent by:
> "dfdl-wg"
> <[email protected]>
> Looks like ICU changed behavior....
> From: Steve Lawrence <[email protected]>
> Sent: Thursday, August 29, 2019 1:30 PM
> To: [email protected]
> Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How
> to model a fixed-length integer that may be padded with space on the left?
> I think this is a difference in ICU version?
> A little grepping through ICU source, I found a change [1] to their
> number parsing logic in Dec 2017:
> + if (!isStrict) {
> + parser.addMatcher(WhitespaceMatcher.getInstance());
> + parser.addMatcher(new
> PlusSignMatcher());
> + }
> That looks to me like a change to make it so plus signs are always
> matched in lax/lenient mode regardless of the pattern (Daffodils current
> behavior). A couple minor changes have been made to that section, but
> nothing that allows you to turn if off if lenient is on.
> It's hard to tell in the git history what release that was in, but it
> looks like around version 61, which is relatively new (Daffodil is on
> version 62).
> Also, the latest version of DecimalFormatProperties.java (looks to be an
> internal implementation, so no online javadocs), has javadocs that
> states that plus signs are always allowed in lenient/lax mode [2].
> I think this is a change in ICU behavior in newer versions.
> - Steve
> [1]
> https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122
> [2]
> https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54
> --
> dfdl-wg mailing list
> [email protected]
> https://www.ogf.org/mailman/listinfo/dfdl-wg
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)