David Schultz escribió:
On Mon, Apr 27, 2009, Joerg Sonnenberger wrote:
On Mon, Apr 27, 2009 at 11:49:41AM -0700, Tim Kientzle wrote:
David Schultz wrote:
... whether it would make more sense to standardize on something like
UCS-4 for the internal representation.
YES. Without this, wchar_t is useless.
I strongly disagree. Everything can be represented as UCS-4 is a bad
assumption, but something Americans and Europeans naturally don't have
to care about.
...but isn't this moot at present because there are no
widely-accepted encodings that include characters that
aren't supported by UCS-4? Citrus doesn't seem to support
any such encodings in any case.
Citrus is based on UCS-4 as an internal encoding, just like the another
BSD-licensed iconv library. This is a barrier to support encodings that
aren't supported by UCS-4.
If this ever really becomes an issue, we could always stuff
locale-dependent encodings into unused UCS-4 code pages.
However, it doesn't seem worthwhile to deliberately burden
programmers over concerns that are presently, and for the
foreseeable future, hypothetical.
I'm not a Unicode expert, but isn't the reason of periodical standard
reviews and changes to cover more and more human languages? We could
just support the latest Unicode standard and let the Unicode workgroups
map those new characters into unused code points. The Latin-based,
Cyrillic, Devanagari and CJK encodings are well-supported, I think. I
don't know too much about CJK encondings, though, if the thousands of
ideographs are all supported or not. But I'd say the most significant
languages that are used on the Internet are supported, the rest might
have another problems...
[OFF]
It's possible that there are little poor countries with an own writing
system but probably their writing system is unsupported because the
starvation, poorness and lack of water and electricity are more serious
problems there. My ex-girlfriend is working in Nepal in a cooperation
program (it's kinda scholarship) and she told me that they only have
electricity in 8 hours a day, 4 during the night and 4 during the day.
There are no sidewalks for pedestrians, they go along with the cars on
the street and the pollution is extremely high. Even this country's
encoding is supported. What I am trying to say is that countries with
unsupported languages probably won't really care about character
encodings if they rarely have computers... I can just hope that their
living conditions will get better and their language will be supported.
I can also hope that the Unicode people will focus more on these
countries instead of fucking up the time with fictionary languages from
fairy tales... [1]
Probably I'll go to visit her in Nepal in January, it will be an
interesting experience. I'll check if I can help the IT world there with
anything.
[ON]
Another idea to consider. Are all of our utilities wchar-clean? What
about library functions? (regex is surely not) Do we lack any important
utility or library? (we still do lack iconv and gettext and what
else...?) What about standards, like C99 wchar functions? Is there
something missing? What about POSIX if it has something related?
Personally, I think that these are more important questions than support
of some extremely rare languages. It's worth to consider how to deal
with them later but the basic problems need a higher priority.
[1] http://en.wikipedia.org/wiki/Tengwar#Unicode
Cheers,
--
Gabor Kovesdan
FreeBSD Volunteer
EMAIL: [email protected] .:|:. [email protected]
WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[email protected]"