Thanks Sherman, Yes I agree. Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leon...@uk.ibm.com
From: Xueming Shen <xueming.s...@gmail.com> To: core-libs-dev@openjdk.java.net Date: 08/01/2019 16:50 Subject: Re: JDK-8215626 : Correct [^..&&..] intersection negation behaviour JDK8 vs JDK11 ?? Sent by: "core-libs-dev" <core-libs-dev-boun...@openjdk.java.net> Hi Andrew, See [1]/[2] for the background of the fix. I would say jdk11 behavior is correct and expected :-) anyway, it's a behavior change, so probably will not be easily to go back into jdk8. Regards, Sherman [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-June/006957.html [2] http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/lambdafunction On 1/7/19 5:50 AM, Andrew Leonard wrote: > Anyone got any views on which "regex" beheviour is correct JDK8 or JDK11 ? > thanks > Andrew > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leon...@uk.ibm.com > > > > > From: Andrew Leonard/UK/IBM > To: "OpenJDK Core Libs Developers" <core-libs-dev@openjdk.java.net> > Date: 03/01/2019 11:20 > Subject: JDK-8215626 : Correct [^..&&..] intersection negation > behaviour JDK8 vs JDK11 ?? > > > Hi, > I'm currently investigating bug JDK-8215626 and have discovered the > problem is in the Pattern interpretation of the [^..&&..] negation when > applied to "intersected" expressions. So I have simplified the bug example > to a more extreme and obvious example: > Input string: "1234 ABCDEFG !$%^& abcdefg" > pattern RegEx: "[^[A-B]&&[^ef]]" > Operation: pattern.matcher(input).replaceAll(""); > > JDK8 output: > 1234 CDEFG !$%^& abcdefg > JDK11 output: > AB > > So from the "spec" : > A character class is a set of characters enclosed within square brackets. > It specifies the characters that will successfully match a single > character from a given input string > Intersection: > To create a single character class matching only the characters common to > all of its nested classes, use &&, as in [0-9&&[345]]. > Negation: > To match all characters except those listed, insert the "^" metacharacter > at the beginning of the character class. > > The way I read the "spec" is the "^" negation negates the whole character > class within the outer square brackets, thus in this example: > "[^[A-B]&&[^ef]]" is equivalent to the negation of "[[A-B]&&[^ef]]" > ie.the negation of the intersect of chars A,B and everything other > than e,f > which is thus the negation of A,B > hence the operation above will remove any character in the input > string other than A,B > Hence, JDK11 in my opinion meets the "spec". It looks as though JDK8 is > applying the ^ negation to just [A-B] and then intersecting it with [^ef], > which to me is the wrong interpretation of the "spec". > > Your thoughts please? > > If JDK11 is correct, and JDK8 wrong, then the next question is do we fix > JDK8? as there's obviously potential "behavioural" impacts to existing > applications....? > > Thanks > Andrew > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leon...@uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU