Re: questions with combining characters [was: Unicode: endpoint of evolution of encodings?]

Antoine Leca Wed, 17 Nov 2004 04:57:42 -0800

Egmont Koblinger va escriure:
>
> Slightly based on it, I have some questions with the combining
> characters. It's clear to me how they should be handled if the
> complete text to be displayed in known in advance. But I don't know
> what has to be done if one tries to display a real-time text flow.


Is it a problem with displaying? Or rather a problem with the protocol?


> Just think of a talk/ytalk enhancement working with UTF-8 encoding
> and NFD representation. And network lags...
>
> Maybe I type an "�", first "a" is sent over the network, then for some
> reason some packets are lost or there's a short network failure, and
> the combining acute is only sent five seconds later. The receiver
> party has to first display an "a" since it doesn't know it's going to
> be continued.

Sure.

> Then later it has to be able to put an accent over the
> already displayed character.

There is where I am not that sure.
In the RFC about Japanese emails, there is a _redundant_ information
appended at the beginning of each line, the introducer that anounces that
the following text will be Japanese. The rationale behind this redundancy is
obviously to allow softwares to deal with a line without needing a full
context.

Similarly, I would consider that the reception of the U+0300, if it has been
detached from its base, maybe could be not retained for displaying. Of
course, if this is a problem (that is, if we are not speaking about IRC
chat), then the protocol should be hardened, perhaps by preventing such
orphans, or resending the whole line, etc.


> What is backspace supposed to do with NFD unicode streams?

Backspace is a control function, this is explicitely outside Unicode.
I would say the definition is to be given by the protocol that defines BS as
an "erasing" character; so in a keyboard driver it should probably undo the
last keystroke, while on a printer it may function as overstruking (as in
CCITT's original), to deal with e.g. daisywheels.

> Should it
> delete one unicode entity (that is only the accent from the top of a
> letter) or a complete combined character?

Looks like two perfectly acceptable ways to deal with it.
For example, IMHO the second might be a good choice if received by your
talk/ytalk enhancement.
But clearly it is the job of your enhancement to describe this kind of
behaviours. As part of the design of the protocol.


Antoine


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: questions with combining characters [was: Unicode: endpoint of evolution of encodings?]

Reply via email to