On 1/2/24 17:21, enh wrote:
>> > if you really care, not even icu4c (my usual answer to such
>> > questions, and something bionic regularly forwards such questions to),
>> > you want to talk to something like
>> > https://en.wikipedia.org/wiki/HarfBuzz instead --- this shit gets
>> > weird, fast.
>>
>> Yes, but that's not really the question I'm asking.
> 
> no, but it's the question you actually _need_ to ask if you're worried
> about doing something _useful_

I'm worried about implementing unicode-aware interactive line editing for toysh,
which may someday get retrofitted onto the vi implementation but for now that's
not my problem.

The way I _thought_ fold worked is how line editing has to work: backspace
undoes the previous character, including jumping back to the start of variable
width tabs, so I've got to checkpoint the previous position for backspace to
return to.

There are various horrible alternatives, including send the ansi position query
after every keystroke or jumping to the left edge and rewriting the entire line
each time with "clear to end of line" sequence at the end, but I'd rather use a
solution that ISN'T crazy.

> --- it's probably better to think of
> some scripts as "nothing but combining characters".

Then what do they combine _with_?

I tried putting an umlaut on low ascii characters. It didn't even work with 
"tab"...

>> How often do new unicode
>> tables come out and do they ever really make big changes?
> 
> "about one/year" [citation needed?
> https://en.wikipedia.org/wiki/Unicode#Versions]
> 
>> There are only 1.1
>> million possible values, this is not a big table of numbers in a modern
>> computing context, and there presumably ARE answers?
> 
> my point is that it's the _combinations_ that are interesting. that's
> why i mentioned harfbuzz.
> https://harfbuzz.github.io/why-do-i-need-a-shaping-engine.html is a
> good high-level intro (the paragraph containing the word "arabic" in
> particular).

Um... if combining characters change the width of the base character, I think
I'm just plain gonna get the fontmetrics wrong there. I don't see how I can
avoid it.

>> Anyway, why is this NOT a couple bitmaps for 0 and 1 and an if/else staircase
>> for oddballs, else size 2. I'm aware the xfce terminal isn't exactly 
>> cannonical,
>> and maybe it's printing something when it shouldn't, but this is the question
>> I'm trying to ask with wcwidth(). When I print this, how many columns does 
>> that
>> consume on the terminal? It's giving a width to these characters.
> 
> (see the harfbuzz documentation for why "character width" isn't a
> meaningful concept for all the world's scripts :-) )

Then I can't support all the world's scripts.

The perfect is the enemy of the good. I want to figure out the subset I _can_
support. And right now, it's not handling japanese.

If I have to make simplifying assumptions, then "low ascii is weird", and every
other unicode codepoint is either 0, 1, or 2 characters, and maybe I need to
handle the right to left direction switching codepoints but I'm not entirely
sure how.

It sounds like getting this perfect is a full-time job for a dedicated domain
expert, and even they can't package it up in a useful fashion so people who
AREN'T domain experts can ask simple questions that get answers. (If the unicode
consortium produced a mess that goes non-euclidian in places, I only have so
much brain to try to understand the results with.)

Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to