On Wed, 16 Feb 2022 22:41:40 GMT, Ian Graves <igra...@openjdk.org> wrote:

>> This is a fix in the buggy way CIBackRef traverses unicode characters that 
>> could be variable-length. Originally it followed the approach that BackRef 
>> does, but failed to account for unicode characters that could be 2 
>> chars-long. The upper bound (groupSize) for the traversing loop is set by 
>> the difference between group start and stop indexes. This works for single 
>> char characters and it also works for case-sensitive comparisons because 
>> byte-by-byte comparisons are acceptable, but it doesn't work for a 
>> comparison where some kind of normalization (i.e. case) is required. This 
>> fix adjusts the upper bound for the loop that traverses the character when a 
>> two-char character is encountered.
>> 
>> An alternative was to check the length of the group size by scanning the 
>> group in advance and converting to code points, but this could potentially 
>> result in multiple scans and codepoint conversions of the same matcher group 
>> which could be long. The solution that adjusts the loop bounds on the fly 
>> avoids this case.
>
> Ian Graves has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Removing increment variable and some other tweaks

Looks fine.

-------------

Marked as reviewed by naoto (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7501

Reply via email to