On Tue, Jan 2, 2024 at 4:28 PM Rob Landley <r...@landley.net> wrote:
>
> On 1/2/24 17:21, enh wrote:
> >> > if you really care, not even icu4c (my usual answer to such
> >> > questions, and something bionic regularly forwards such questions to),
> >> > you want to talk to something like
> >> > https://en.wikipedia.org/wiki/HarfBuzz instead --- this shit gets
> >> > weird, fast.
> >>
> >> Yes, but that's not really the question I'm asking.
> >
> > no, but it's the question you actually _need_ to ask if you're worried
> > about doing something _useful_
>
> I'm worried about implementing unicode-aware interactive line editing for 
> toysh,
> which may someday get retrofitted onto the vi implementation but for now 
> that's
> not my problem.
>
> The way I _thought_ fold worked is how line editing has to work: backspace
> undoes the previous character, including jumping back to the start of variable
> width tabs, so I've got to checkpoint the previous position for backspace to
> return to.
>
> There are various horrible alternatives, including send the ansi position 
> query
> after every keystroke or jumping to the left edge and rewriting the entire 
> line
> each time with "clear to end of line" sequence at the end, but I'd rather use 
> a
> solution that ISN'T crazy.
>
> > --- it's probably better to think of
> > some scripts as "nothing but combining characters".
>
> Then what do they combine _with_?

https://github.com/n8willis/opentype-shaping-documents/blob/master/opentype-shaping-arabic.md

> I tried putting an umlaut on low ascii characters. It didn't even work with 
> "tab"...
>
> >> How often do new unicode
> >> tables come out and do they ever really make big changes?
> >
> > "about one/year" [citation needed?
> > https://en.wikipedia.org/wiki/Unicode#Versions]
> >
> >> There are only 1.1
> >> million possible values, this is not a big table of numbers in a modern
> >> computing context, and there presumably ARE answers?
> >
> > my point is that it's the _combinations_ that are interesting. that's
> > why i mentioned harfbuzz.
> > https://harfbuzz.github.io/why-do-i-need-a-shaping-engine.html is a
> > good high-level intro (the paragraph containing the word "arabic" in
> > particular).
>
> Um... if combining characters change the width of the base character, I think
> I'm just plain gonna get the fontmetrics wrong there. I don't see how I can
> avoid it.
>
> >> Anyway, why is this NOT a couple bitmaps for 0 and 1 and an if/else 
> >> staircase
> >> for oddballs, else size 2. I'm aware the xfce terminal isn't exactly 
> >> cannonical,
> >> and maybe it's printing something when it shouldn't, but this is the 
> >> question
> >> I'm trying to ask with wcwidth(). When I print this, how many columns does 
> >> that
> >> consume on the terminal? It's giving a width to these characters.
> >
> > (see the harfbuzz documentation for why "character width" isn't a
> > meaningful concept for all the world's scripts :-) )
>
> Then I can't support all the world's scripts.
>
> The perfect is the enemy of the good. I want to figure out the subset I _can_
> support. And right now, it's not handling japanese.
>
> If I have to make simplifying assumptions, then "low ascii is weird", and 
> every
> other unicode codepoint is either 0, 1, or 2 characters, and maybe I need to
> handle the right to left direction switching codepoints but I'm not entirely
> sure how.
>
> It sounds like getting this perfect is a full-time job for a dedicated domain
> expert, and even they can't package it up in a useful fashion so people who
> AREN'T domain experts can ask simple questions that get answers. (If the 
> unicode
> consortium produced a mess that goes non-euclidian in places, I only have so
> much brain to try to understand the results with.)

right, but then i'm back to "why don't you just trust wcwidth() and
move on with your life?" :-)

isn't that all the competition is doing? (i actually have no idea ---
i don't speak any rtl languages, so korean is the most exotic thing
i've ever done at the prompt, and that's not really any more
complicated than german in this sense.)

> Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to