On Wed, Nov 22, 2023 at 11:04 AM Rob Landley <r...@landley.net> wrote: > > Wow, how long has THIS one been buried behind other windows? (Trying to > finally > reboot my laptop so I can upgrade stuff...) > > On 10/11/23 11:13, enh wrote: > > On Wed, Oct 11, 2023 at 3:22 AM Rob Landley <r...@landley.net> wrote: > >> > >> On 10/6/23 05:05, Rob Landley wrote: > >> > Apparently the widest unicode characters are: > >> > > >> > 1. ﷽ > >> > > >> > 2. 𒐫 > >> > > >> > 3. 𒈙 > >> > > >> > 4. ⸻ > >> > > >> > 5. ꧅ > >> > > >> > The first 4 of which xfce's terminal does NOT like. And thunderbird fits > >> > the > >> > first one in 3 columns while vim's giving it... 9 I think. > >> > >> And trying to add a file with those to the test suite, neither glibc nor > >> musl is > >> returning wcwidth() for them (it's all 1). And washing the attempt through > >> ltrace it looks like their unicode code points aren't defined in > >> https://www.w3.org/TR/xml-entity-names/1D7.html and friends. > >> > >> Which is odd because the web browser and terminal and so on render them > >> properly. But if neither glibc nor musl can handle them, I can't add "fold" > >> tests for them, can I? (Haven't tried bionic, but possibly this is what > >> Elliott > >> meant when said he used a bigger gui library for this sort of thing...) > > > > yeah, bionic _does_ implement wcwidth() but admits that it's fairly > > bogus. > > I miss java 1.1's fontmetrics with the awt and lightweight canvas where we > just > wrote our own widget set and it worked. It was the first graphical toolkit I'd > actually been _comfortable_ with since logo. (And I say that having learned > IBM's System Object Model in order maintain a project implemented as a > subclass > of the OS/2 Workplace Shell's "folder" class.) > > Pity they added swing (hell no), and then Sun screwed over blackdown so hard I > fled screaming from the entire language... > > > if you really care, not even icu4c (my usual answer to such > > questions, and something bionic regularly forwards such questions to), > > you want to talk to something like > > https://en.wikipedia.org/wiki/HarfBuzz instead --- this shit gets > > weird, fast. > > Yes, but that's not really the question I'm asking.
no, but it's the question you actually _need_ to ask if you're worried about doing something _useful_ --- it's probably better to think of some scripts as "nothing but combining characters". > How often do new unicode > tables come out and do they ever really make big changes? "about one/year" [citation needed? https://en.wikipedia.org/wiki/Unicode#Versions] > There are only 1.1 > million possible values, this is not a big table of numbers in a modern > computing context, and there presumably ARE answers? my point is that it's the _combinations_ that are interesting. that's why i mentioned harfbuzz. https://harfbuzz.github.io/why-do-i-need-a-shaping-engine.html is a good high-level intro (the paragraph containing the word "arabic" in particular). > [scribble scribble scribble...] > > The attached fontmetrics.c prints each character and asks the terminal (in > this > case xfce's) how many columns the cursor moved, using the query cursor > position > ascii escape sequence. You run it ala "./fontmetrics > out.txt" and then leave > that terminal alone for a while. (Alas with "| tee out.txt" instead of a > redirect the conflicting writes to stdin and stdout occasionally glitch > slightly.) The results are 0 columns, 1 column, 2 columns, and everything > else. > It's still running (slow) but so far I've got: > > $ for i in 0 1 2 '-v =[012]'; do grep $i'$' out.txt | wc -l; done > 1160 > 10734 > 15891 > 2 > > And those two weirdos are: > > $ grep -v '=[012]' out.txt > 9=8 > 89=8 > > And two of those "else" are tab (which is weird) and enter (which I think > confused it, partly because it was in raw mode so it could read the returned > sequences without waiting for a newline). > > I'm not quite sure what's up with 0x89, but: > > $ toybox unicode 0x88 > U+0088 : : 0xc2 0x88 > landley@driftwood:~/toybox/toybox$ toybox unicode 0x89 > U+0089 : : 0xc2 0x89 > > I mean yeah, I'm seeing it. (High tab?) Haven't poked much yet. > > Anyway, why is this NOT a couple bitmaps for 0 and 1 and an if/else staircase > for oddballs, else size 2. I'm aware the xfce terminal isn't exactly > cannonical, > and maybe it's printing something when it shouldn't, but this is the question > I'm trying to ask with wcwidth(). When I print this, how many columns does > that > consume on the terminal? It's giving a width to these characters. (see the harfbuzz documentation for why "character width" isn't a meaningful concept for all the world's scripts :-) ) > > (bionic's wcwidth() just passes on the Unicode > > EastAsianWidth property, which isn't _useless_ but it's way too > > simplistic a model to handle stuff like this.) > > There are currently 149,813 unicode characters and the largest possible width > is > what, 7? So 3 bits each, 56k for a naieve implementation. > > The thing that confuses me is this seems like it would HAVE an objective > answer... > > Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net