Egmont Koblinger va escriure: > > Slightly based on it, I have some questions with the combining > characters. It's clear to me how they should be handled if the > complete text to be displayed in known in advance. But I don't know > what has to be done if one tries to display a real-time text flow.
Is it a problem with displaying? Or rather a problem with the protocol? > Just think of a talk/ytalk enhancement working with UTF-8 encoding > and NFD representation. And network lags... > > Maybe I type an "�", first "a" is sent over the network, then for some > reason some packets are lost or there's a short network failure, and > the combining acute is only sent five seconds later. The receiver > party has to first display an "a" since it doesn't know it's going to > be continued. Sure. > Then later it has to be able to put an accent over the > already displayed character. There is where I am not that sure. In the RFC about Japanese emails, there is a _redundant_ information appended at the beginning of each line, the introducer that anounces that the following text will be Japanese. The rationale behind this redundancy is obviously to allow softwares to deal with a line without needing a full context. Similarly, I would consider that the reception of the U+0300, if it has been detached from its base, maybe could be not retained for displaying. Of course, if this is a problem (that is, if we are not speaking about IRC chat), then the protocol should be hardened, perhaps by preventing such orphans, or resending the whole line, etc. > What is backspace supposed to do with NFD unicode streams? Backspace is a control function, this is explicitely outside Unicode. I would say the definition is to be given by the protocol that defines BS as an "erasing" character; so in a keyboard driver it should probably undo the last keystroke, while on a printer it may function as overstruking (as in CCITT's original), to deal with e.g. daisywheels. > Should it > delete one unicode entity (that is only the accent from the top of a > letter) or a complete combined character? Looks like two perfectly acceptable ways to deal with it. For example, IMHO the second might be a good choice if received by your talk/ytalk enhancement. But clearly it is the job of your enhancement to describe this kind of behaviours. As part of the design of the protocol. Antoine -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
