On Wed, Nov 22, 2023 at 11:04 AM Rob Landley <r...@landley.net> wrote:
>
> Wow, how long has THIS one been buried behind other windows? (Trying to 
> finally
> reboot my laptop so I can upgrade stuff...)
>
> On 10/11/23 11:13, enh wrote:
> > On Wed, Oct 11, 2023 at 3:22 AM Rob Landley <r...@landley.net> wrote:
> >>
> >> On 10/6/23 05:05, Rob Landley wrote:
> >> > Apparently the widest unicode characters are:
> >> >
> >> > 1. ﷽
> >> >
> >> > 2. 𒐫
> >> >
> >> > 3. 𒈙
> >> >
> >> > 4. ⸻
> >> >
> >> > 5. ꧅
> >> >
> >> > The first 4 of which xfce's terminal does NOT like. And thunderbird fits 
> >> > the
> >> > first one in 3 columns while vim's giving it... 9 I think.
> >>
> >> And trying to add a file with those to the test suite, neither glibc nor 
> >> musl is
> >> returning wcwidth() for them (it's all 1). And washing the attempt through
> >> ltrace it looks like their unicode code points aren't defined in
> >> https://www.w3.org/TR/xml-entity-names/1D7.html and friends.
> >>
> >> Which is odd because the web browser and terminal and so on render them
> >> properly. But if neither glibc nor musl can handle them, I can't add "fold"
> >> tests for them, can I? (Haven't tried bionic, but possibly this is what 
> >> Elliott
> >> meant when said he used a bigger gui library for this sort of thing...)
> >
> > yeah, bionic _does_ implement wcwidth() but admits that it's fairly
> > bogus.
>
> I miss java 1.1's fontmetrics with the awt and lightweight canvas where we 
> just
> wrote our own widget set and it worked. It was the first graphical toolkit I'd
> actually been _comfortable_ with since logo. (And I say that having learned
> IBM's System Object Model in order maintain a project implemented as a 
> subclass
> of the OS/2 Workplace Shell's "folder" class.)
>
> Pity they added swing (hell no), and then Sun screwed over blackdown so hard I
> fled screaming from the entire language...
>
> > if you really care, not even icu4c (my usual answer to such
> > questions, and something bionic regularly forwards such questions to),
> > you want to talk to something like
> > https://en.wikipedia.org/wiki/HarfBuzz instead --- this shit gets
> > weird, fast.
>
> Yes, but that's not really the question I'm asking.

no, but it's the question you actually _need_ to ask if you're worried
about doing something _useful_ --- it's probably better to think of
some scripts as "nothing but combining characters".

> How often do new unicode
> tables come out and do they ever really make big changes?

"about one/year" [citation needed?
https://en.wikipedia.org/wiki/Unicode#Versions]

> There are only 1.1
> million possible values, this is not a big table of numbers in a modern
> computing context, and there presumably ARE answers?

my point is that it's the _combinations_ that are interesting. that's
why i mentioned harfbuzz.
https://harfbuzz.github.io/why-do-i-need-a-shaping-engine.html is a
good high-level intro (the paragraph containing the word "arabic" in
particular).

> [scribble scribble scribble...]
>
> The attached fontmetrics.c prints each character and asks the terminal (in 
> this
> case xfce's) how many columns the cursor moved, using the query cursor 
> position
> ascii escape sequence. You run it ala "./fontmetrics > out.txt" and then leave
> that terminal alone for a while. (Alas with "| tee out.txt" instead of a
> redirect the conflicting writes to stdin and stdout occasionally glitch
> slightly.) The results are 0 columns, 1 column, 2 columns, and everything 
> else.
> It's still running (slow) but so far I've got:
>
> $ for i in 0 1 2 '-v =[012]'; do grep $i'$' out.txt | wc -l; done
> 1160
> 10734
> 15891
> 2
>
> And those two weirdos are:
>
> $ grep -v '=[012]' out.txt
> 9=8
> 89=8
>
> And two of those "else" are tab (which is weird) and enter (which I think
> confused it, partly because it was in raw mode so it could read the returned
> sequences without waiting for a newline).
>
> I'm not quite sure what's up with 0x89, but:
>
> $ toybox unicode 0x88
> U+0088 :  : 0xc2 0x88
> landley@driftwood:~/toybox/toybox$ toybox unicode 0x89
> U+0089 :         : 0xc2 0x89
>
> I mean yeah, I'm seeing it. (High tab?) Haven't poked much yet.
>
> Anyway, why is this NOT a couple bitmaps for 0 and 1 and an if/else staircase
> for oddballs, else size 2. I'm aware the xfce terminal isn't exactly 
> cannonical,
> and maybe it's printing something when it shouldn't, but this is the question
> I'm trying to ask with wcwidth(). When I print this, how many columns does 
> that
> consume on the terminal? It's giving a width to these characters.

(see the harfbuzz documentation for why "character width" isn't a
meaningful concept for all the world's scripts :-) )

> > (bionic's wcwidth() just passes on the Unicode
> > EastAsianWidth property, which isn't _useless_ but it's way too
> > simplistic a model to handle stuff like this.)
>
> There are currently 149,813 unicode characters and the largest possible width 
> is
> what, 7? So 3 bits each, 56k for a naieve implementation.
>
> The thing that confuses me is this seems like it would HAVE an objective 
> answer...
>
> Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to