*Pretext* The reference implementation has strange behavior when there are nested character classes involved.
This is described briefly in the bug report: https://bugs.openjdk.java.net/browse/JDK-6609854 And also brought up again with more details on stackoverflow.com http://stackoverflow.com/questions/21934168/double-negation-of-regex-character-classes *Proposed patch* I have looked at the current implementation and I'd like to propose the following patch. The patch is *not yet tested against the test cases in test package*. However, *the structure has been verified via reflection*. *(Commented lines are lines to be removed. Lines with trailing comment are those to be added.)* In method private CharProperty clazz(boolean consume) case ']': firstInClass = false; if (prev != null) { if (consume) next(); if (!include) { // prev = prev.complement(); // } // return prev; } break; default: firstInClass = false; break; } node = range(bits); // if (include) { if (prev == null) { prev = node; } else { if (prev != node) prev = union(prev, node); } // } else { // if (prev == null) { // prev = node.complement(); // } else { // if (prev != node) // prev = setDifference(prev, node); // } // } ch = peek(); *Fix Explanation* In the current code, a nested character class that generates a single Node (for example, [^[^5-6]], [^[^c]], [^[^456]]) will never reach the check if (include) outside the switch statement. As a result, the negation is not applied to the nested character class. It also causes redundancy with patterns such as [^45[^67]] or [^4589[^67]], since the first character 4 triggers complement() on the BitClass bits and every subsequence character (5, 8, 9) triggers setDifference() on the BitClass bits and the complement of the same BitClass bits. *Current structure* [^45[^67]] Start. Start unanchored match (minLength=1) Pattern.union (character class union). Match any character matched by either character classes below: Pattern.setDifference (character class subtraction). Match any character matched by the 1st character class, but NOT the 2nd character class: CharProperty.complement (character class negation). Match any character NOT matched by the following character class: BitClass. Optimized character class with boolean[] to match characters in Latin-1 (code point <= 255). Match the following 2 character(s): [U+0034][U+0035] 45 BitClass. Optimized character class with boolean[] to match characters in Latin-1 (code point <= 255). Match the following 2 character(s): [U+0034][U+0035] 45 Pattern.setDifference (character class subtraction). Match any character matched by the 1st character class, but NOT the 2nd character class: CharProperty.complement (character class negation). Match any character NOT matched by the following character class: BitClass. Optimized character class with boolean[] to match characters in Latin-1 (code point <= 255). Match the following 2 character(s): [U+0036][U+0037] 67 BitClass. Optimized character class with boolean[] to match characters in Latin-1 (code point <= 255). Match the following 2 character(s): [U+0036][U+0037] 67 LastNode Node. Accept match *Structure after patch* [^45[^67]] Start. Start unanchored match (minLength=1) CharProperty.complement (character class negation). Match any character NOT matched by the following character class: Pattern.union (character class union). Match any character matched by either character classes below: BitClass. Optimized character class with boolean[] to match characters in Latin-1 (code point <= 255). Match the following 2 character(s): [U+0034][U+0035] 45 CharProperty.complement (character class negation). Match any character NOT matched by the following character class: BitClass. Optimized character class with boolean[] to match characters in Latin-1 (code point <= 255). Match the following 2 character(s): [U+0036][U+0037] 67 LastNode Node. Accept match