alhudz commented on PR #1719:
URL: https://github.com/apache/commons-lang/pull/1719#issuecomment-4759024996

   Had a look at `testEmoji()`. With the 
`expectedResultsFox`/`expectedResultsFamilyWithCodepoints` assertions commented 
in, it fails both before and after this PR, so they have never been green on 
`master`:
   
   `abbreviate("🦊…", 4)`
   - before: `<lone high surrogate>...` (the malformed output this PR targets)
   - this PR: `...`
   - `testEmoji` expects: `🦊...`
   
   The gap is the contract. `testEmoji` counts `maxWidth` in code points: the 
marker is 3, then `maxWidth - 3` whole code points (width 4 → 1 fox, width 5 → 
2 foxes, and the family case counts each skin-tone modifier and ZWJ as one). So 
`🦊...` is 5 `char`s but 4 code points. This PR keeps the documented 
`char`-based contract (`result.length() <= maxWidth`) and only nudges the cut 
to a code-point boundary, which is always shorter, so it drops the partial 
emoji rather than keeping it whole and over the numeric width.
   
   So this change is strictly the lone-surrogate fix. Making `testEmoji` pass 
is the larger LANG-1770 job of re-basing `abbreviate` on code points, which 
redefines what `maxWidth` means and the `length() <= maxWidth` guarantee. Happy 
to take that on as the actual LANG-1770 fix if you want it in this PR, or keep 
this one scoped to removing the malformed surrogate. Which would you prefer?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to