Hello, Thanks for your response.
Bruno Haible wrote: > Regarding the Default_Ignorable_Code_Point characters: Making all of > them > non-spacing would assign width 0 to the characters > U+115F HANGUL CHOSEONG FILLER > U+3164 HANGUL FILLER > U+FFA0 HALFWIDTH HANGUL FILLER > But this does not make sense to me: > > * You exclude U+115F from your consideration, but the justification > is weak: Hangul composition of 3 characters in the range U+11xx > creates a Hangul syllable, and widths don't add up: 1 + 1 + 1 != 2 in > the general case. The combining Hangul jamo characters are all assigned an `East_Asian_Width` of `Wide` by Unicode, which would normally mean they would all be assigned width 2; a combination of (leading choseong) + (medial jungseong) + (trailing jongseong) would have width 2 + 2 + 2 = 6. However, this library (and glibc, and other wcwidth implementations) special-cases jungseong and jongseong, assigning them all width 0, to ensure that the complete block has width 2 + 0 + 0 = 2. Assigning U+115F a width of 2 even though it has no visible display is necessary to keep this scheme working. You can read more about Unicode jamo in the Unicode spec, sections 3.12 <https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G24646> and 18.6 <https://www.unicode.org/versions/Unicode15.0.0/ch18.pdf#G31028>. > * The names of U+FFA0 being "HALFWIDTH HANGUL FILLER", it suggests > that "HANGUL FILLER" traditionally has width 2 and "HALFWIDTH HANGUL > FILLER" traditionally has width 1. If both had width 0, there would > not be a need for the HALFWIDTH one. U+3164 HANGUL FILLER and U+FFA0 HALFWIDTH HANGUL FILLER are compatibility characters that exist purely for interoperability with legacy character sets. That's why their behavior may seem strange. I have found some historical background on them here, though I can't be sure it's fully accurate: <https://github.com/jagracey/Awesome-Unicode/issues/4> > * glibc's wcwidth() function returns nonzero for these characters: I've submitted a patch to glibc as well. > * Your argument by an FAQ is weak, since FAQs typically tend to > simplify things, so that they become easier to state or to > understand. The Unicode Standard, version 15.0, §5.21 - Characters Ignored for Display <https://www.unicode.org/versions/Unicode15.1.0/ch05.pdf#G40095> states that all `Default_Ignorable_Code_Point`s should be "ignored for display in fallback rendering", including "Hangul fillers". There is no ambiguity here, and common rendering implementations treat them as zero-width just like the spec says. --- Since submitting this patch, I've noticed that §5.21 also highlights another issue with gnulib (and glibc)'s width implementation: """ A small number of format characters (General_Category = Cf) are also not given the Default_Ignorable_Code_Point property. This may surprise implementers, who often assume that all format characters are generally ignored in fallback display. The exact list of these exceptional format characters can be found in the Unicode Character Database. There are, however, three important sets of such format characters to note: - prepended concatenation marks - interlinear annotation characters - Egyptian hieroglyph format controls [...] The other two notable sets of format characters that exceptionally are not ignored in fallback display consist of the interlinear annotation characters, U+FFF9 INTERLINEAR ANNOTATION ANCHOR through U+FFFB INTERLINEAR ANNOTATION TERMINATOR, and the Egyptian hieroglyph format controls, U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER through U+1343F EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE. These characters should have a visible glyph display for fallback rendering, because if they are not displayed, it is too easy to misread the resulting displayed text. See “Annotation Characters” in Section 23.8, Specials[0], as well as Section 11.4, Egyptian Hieroglyphs[1] for more discussion of the use and display of these characters. [0]: https://www.unicode.org/versions/Unicode15.1.0/ch23.pdf#M9.21335.Heading.133.Specials [1]: https://www.unicode.org/versions/Unicode15.1.0/ch11.pdf#M9.73291.Heading.1418.Egyptian.Hieroglyphs """ There is no way for a terminal to realistically handle the interlinear annotation anchors except via fallback rendering, so these should have non-zero width also. As for the hieroglyph format controls, they appear to not have wide support at present, so assuming fallback rendering likely makes sense there as well. This would imply that we should perhaps change the zero-width detection logic to not check whether a character is a format control (Cf) at all, and only check for `Default_Ignorable_Code_Point`s, categories Cc, Me, and Mn, and the Hangul special cases. Jules Bertholet
