On Wed, 6 Aug 2025 18:00:00 GMT, Xueming Shen <sher...@openjdk.org> wrote:
>>> for (char c = 0xFF; c < 0xFFFF; c++) >> >> Doesn't this exclude `0xFFFF`, which is a valid (single-`char`, >> non-surrogate) BMP character? >> >>> ... we can just pick any non-bmp panel ... >>> ``` >>> for (int i = 0x10000; i < 0x1FFFF; i++) { ... >>> ``` >> >> Doesn't the non-BMP range rather end with 0x10FFFF? > > (1) we might want to include 0xffff in first pass > (2) we just need to pick any unmappable non-bmp character, i would assume > that it should be pretty safe we will find one in the first non-bmp panel > that is not encoded by a specific charset. In f567f2c81a3, improved `findUnmappableNonLatin1()` as suggested: Single-`char`: for (int i = 0xFF; i <= 0xFFFF; i++) { char c = (char) i; Double-`char` (i.e., surrogate pair): int[] nonBmpRange = {0x10000, 0x10FFFF}; for (int i = nonBmpRange[0]; i < nonBmpRange[1]; i++) { Note that I took the incentive to use 0x10FFFF as the non-BMP range end – easier to understand the exhaustive search. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2260609569