2009/7/27 Geoff <[email protected]>:
> No, I'm thinking about 0x20 (space) to 0x7E with upper
> and lower case ABCDEFGHIJKLMNOPQRSTUVWXYZ, 0123456789,
> and the US ASCII punctuation from 0x21 <-> 0x2F and 0x7B <-> 0x7E.
> That is the common character set for keyboard input
> and printable output across all the systems I use.

> I understand that many people don't have US-ASCII keyboards
> or displays and find the limitation to that character set
> a problem. Still, the ability to map character/byte/octet
> values to and from visible marks in a completely consistent
> manner is valuable to me and perhaps to other people as well.

> Unicode is an extremely unpleasant thing I'm avoiding until
> the worth of the outcome exceeds the pain of conversion.
> For instance, will the world settle on a compressed form
> using the escape convention or will all data change to 16-bit?
> Or both?

Read this: http://en.wikipedia.org/wiki/UTF-8#Description

UTF-8 is 8 bit for the US-ASCII-compatible characters. Anything beyond
that, it's 2, 3, 4 byte per character. But as long as that upmost bit
is zero, you're looking at a completely US-ASCII compatible single
byte character. The 16+ bit characters only start where that upmost
bit is no longer zero.

Btw., in the first table currently shown at that link (have I talked
about this before?), the underlined parts in the example cells are
just to illustrate which parts of the hexadecimal Unicode code points
correspond to which bits in the binary representation on the following
line.

But this is probably getting OT, because AFAIK UTF-8 support isn't in
OpenBSD yet...

regards,
--ropers

Reply via email to