>From [email protected]  Mon Jul 27 10:07:03 2009
Received: from mail-fx0-f215.google.com (mail-fx0-f215.google.com 
[209.85.220.215])
        by lib.oat.com (8.14.3/8.14.3) with ESMTP id n6RE6wFn018594
        for <[email protected]>; Mon, 27 Jul 2009 10:07:02 -0400 (EDT)
Received: by fxm11 with SMTP id 11so2735613fxm.18
        for <[email protected]>; Mon, 27 Jul 2009 07:06:52 -0700 (PDT)
Received: by 10.204.70.19 with SMTP id b19mr3168590bkj.62.1248703612472; Mon, 
        27 Jul 2009 07:06:52 -0700 (PDT)
Subject: Re: locale en_US.ASCII?
To: Geoff <[email protected]>
Cc: [email protected]

2009/7/27 Geoff <[email protected]>:
>> Is there a reason the en_US.ASCII locale is not available in
>> the standard distribution?

on Mon, 27 Jul 2009 16:06:52 +0200 ropers <[email protected]> wrote:
>Well, UTF-8 is backwards-compatible with US-ASCII. Or maybe when you
>said "US-ASCII" you were really thinking of ye olde Code Page 437?
>http://en.wikipedia.org/wiki/Code_page_437

No, I'm thinking about 0x20 (space) to 0x7E with upper
and lower case ABCDEFGHIJKLMNOPQRSTUVWXYZ, 0123456789,
and the US ASCII punctuation from 0x21 <-> 0x2F and 0x7B <-> 0x7E.
That is the common character set for keyboard input
and printable output across all the systems I use.
If all my keyboards, displays, and listing printers used a
consistent ISO8859-1 set (for example), there wouldn't be
much of a problem.

I need to be able to compose & edit text-like data for a
number of systems which use various character sets.
I need to be able to prepare that data in a consistent
manner logged into an OpenBSD system from these systems.

The printers and other specialized I/O devices
producing and consuming my data map 8-bit codes to glyphs
in different ways: mostly nonstandard or obscure.
Worse, many of them have multiple mappings available.

If I set the locale to "C", most systems define
the characters from 0x7F-0xFF as not printable.
Editors, etc, print them as some sort of escaped or hex
representation which is consistent and I can enter the
values as hex or octal.  (n)VI, for instance, shows
characters as backslash & 2 hex digits,
and inputs them as control-X & 2 hex digits.

The problem is that UTF-8 is a superset of US-ASCII.
Various implementations differ incompatibly on how to handle the
characters not in US-ASCII, even when supposedly set to the
same locale:
   glyphs are missing from fonts
   glyphs are inconsistent from system to system
   characters are printed as (character-0x80)
   etc.
Input encoding from keyboards often also changes.

Since the OpenBSD default "C" locale defines many of the
characters 0x7F-0xFF as printable, my local systems attempt
to print the characters in ways that are often not visible
or readable.

I understand that many people don't have US-ASCII keyboards
or displays and find the limitation to that character set
a problem. Still, the ability to map character/byte/octet
values to and from visible marks in a completely consistent
manner is valuable to me and perhaps to other people as well.

>That said, It is my understanding that Unicode support in OpenBSD
>hasn't been completed yet (correct me if I'm wrong).

I don't believe it is there as well, since I haven't
seen wchar typed data in sources very often.

That's (selfishly) fine for me.
Living in a US-English environment shields me from a lot
of the need for conversion.

Unicode is an extremely unpleasant thing I'm avoiding until
the worth of the outcome exceeds the pain of conversion.
For instance, will the world settle on a compressed form
using the escape convention or will all data change to 16-bit?
Or both?

Legacy hardware, programs and data have a way of living forever.
Translation to and from Unicode will be with us for a
very long time even when the common systems and programs
are all Unicode-aware. :-( :-( :-(

geoff steckel

Reply via email to