alhudz commented on PR #1687:
URL: https://github.com/apache/commons-lang/pull/1687#issuecomment-4639160153

   Not endemic, from what I can tell. Most of the `String`/`char` code in Lang 
operates on `char`s by design and is documented that way, so it's fine. The bug 
class I've been hitting is narrower: a method that scans by `char` index but 
then exposes a *code-point* count or contract at its boundary, so a 
supplementary character throws the count off by one.
   
   I've found three of those seams so far, each with a reproducer:
   - `CharSequenceUtils.lastIndexOf` (#1684, merged)
   - `StringUtils.indexOfAny` (this one)
   - `LookupTranslator.translate` (#1691) — returned the matched key length in 
`char`s where the translator loop advances by `Character.charCount`
   
   The way I find them is to look for places that cross between char-indexed 
scanning and a code-point boundary, then build a supplementary-key case and 
check the count. I haven't done an exhaustive sweep of the whole class, so I 
can't promise these are the last three, but they're the ones I could actually 
reproduce rather than guess at.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to