RFR: 8281315: Unicode, (?i) flag and backreference throwing IndexOutOfBounds Exception

Ian Graves Wed, 16 Feb 2022 10:52:52 -0800

This is a fix in the buggy way CIBackRef traverses unicode characters that 
could be variable-length. Originally it followed the approach that BackRef 
does, but failed to account for unicode characters that could be 2 chars-long. 
The upper bound (groupSize) for the traversing loop is set by the difference 
between group start and stop indexes. This works for single char characters and 
it also works for case-sensitive comparisons because byte-by-byte comparisons 
are acceptable, but it doesn't work for a comparison where some kind of 
normalization (i.e. case) is required. This fix adjusts the upper bound for the 
loop that traverses the character when a two-char character is encountered.


An alternative was to check the length of the group size by scanning the group 
in advance and converting to code points, but this could potentially result in 
multiple scans and codepoint conversions of the same matcher group which could 
be long. The solution that adjusts the loop bounds on the fly avoids this case.

-------------

Commit messages:
 - Adding test
 - Initial fix for IOOBE in CIBackRef

Changes: https://git.openjdk.java.net/jdk/pull/7501/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7501&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8281315
  Stats: 26 lines in 2 files changed: 22 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7501.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7501/head:pull/7501

PR: https://git.openjdk.java.net/jdk/pull/7501

RFR: 8281315: Unicode, (?i) flag and backreference throwing IndexOutOfBounds Exception

Reply via email to