[PR] reject non-BMP code points in the Character type converter [commons-cli]

via GitHub Fri, 05 Jun 2026 08:07:32 -0700


digi-scrypt opened a new pull request, #425:
URL: https://github.com/apache/commons-cli/pull/425


   1. the Character converter parses a \uXXXX escape and returns 
Character.toChars(codePoint)[0], so a supplementary code point like \u1F600 
silently becomes the lone high surrogate U+D83D.
   2. that half surrogate is an invalid char and gets lost downstream (it 
round-trips to '?' through UTF-8).
   
   Checked the code point is in the BMP before the cast and throw otherwise, so 
the value is rejected at conversion instead of quietly truncated. Have we 
considered that a char simply cannot hold an astral code point? Added a 
parameterized case for \u1F600 next to the existing \u0124 one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] reject non-BMP code points in the Character type converter [commons-cli]

Reply via email to