https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225692
--- Comment #15 from Yuri Pankov <[email protected]> --- Properly fixing U+FF08 and other full width characters is more involved. A bit of background: we have common ctype definitions file for all UTF-8 locales, src/share/ctypedef/en_US.UTF-8.src (all other locale's symlink to the resulting /usr/share/locale/en_US.UTF-8/LC_CTYPE). src/share/ctypedef/en_US.UTF-8.src is in turn assembled from src/tools/tools/locale/etc/common.UTF-8.src and src/tools/tools/locale/etc/manual-input.UTF-8. src/tools/tools/locale/etc/common.UTF-8.src is built using src/tools/tools/locale/tools/utf8-rollup.pl, which contains the character ranges belonging to different locales, and (among other things it does) checks the corresponding .UTF-8.src file's LC_CTYPE section for character class to be used. If the character is not defined, then it doesn't get into the common.UTF-8.src, which is exactly the case here as it's not found in ja_JP.UTF-8.src and ko_KR.UTF-8.src. TL;DR: all such characters not defined anywhere in *.UTF-8.src from CLDR need to be added to the manual-input.UTF-8. CLDR v34 is really close to be released, but I strongly doubt we will have the full width characters we are missing defined in the new *.UTF-8.src files. To amend the issue for the release, I propose looking up these characters in UTF-8.src we had in src/share/mklocale in pre-11.x times, and adding them to manual-input.UTF-8. The longer term solution would be asking CLDR guys about a way to build a complete ctype map while building POSIX locale data files, and if that's not possible, going over the entire UTF-8.src contents, and adding missing bits to manual-input.UTF-8. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ [email protected] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "[email protected]"
