Re: perl unicode support

David Starner Wed, 28 Mar 2007 11:25:16 -0800

On 3/27/07, Rich Felker <[EMAIL PROTECTED]> wrote:

On Tue, Mar 27, 2007 at 06:44:42PM -0500, David Starner wrote:
> On 3/27/07, Rich Felker <[EMAIL PROTECTED]> wrote:
This is one of the very few
places where a computer should ever perform case mappings: in a
powerful editor or word processor


Just about any program that deals with text is going to have a need to
merge distinctions that the user considers irrelevant, which often
includes case. I use grep -i, even when searching the output of my own
programs sometimes. I could go back and check the case I used in the
messages, but I'd rather let the tools do that.

> >The whole idea of case conversion in programming languages is
> >digustingly euro-centric. The rest of the world doesn't have such a
> >stupid thing as case...
>
> Really? Funny, I'm from North America, and we have a concept of case

Same thing. North American civilization is all European-derived.


The civilization on North America, South America, Europe, Australia
and Antartica is European-derived, but I find it horribly hard to
dismiss something that's universal in five of the seven continents as
"disgustingly euro-centric".

> here. 90% of the languages native to the continent are written in a
> script that has a concept of case.

Is that so? I don't think so. Rather, most of the languages native to
the continent have no native writing system, or use a writing system
that was long ago lost/extincted. Perhaps you should look up the
meaning of the word native.. :)


I wrote precisely what I meant, and I stand by it as correct. Read the
sentence I wrote. No language uses a writing system that was long ago
lost; that's logically absurd. I don't believe the concept of native
writing system is clear, nor do I believe it's useful. Arguably, the
"native" writing system of Irish is Oghma and the "native" writing
system of Greek is Linear B, but from a practical aspect, Irish uses
the Latin script and Greek uses Greek and those are the realities that
we need to be dealing with.

> In fact, I think you'd find that
> most of the world's languages are written in scripts that have a
> concept of case.

This is a very dubious assertion. Technically it depends on how you
measure "most" (language count vs speaker count... also the whole
dialect vs language debate), but otherwise I think it's bogus.


The English meaning of "Most of the world's languages" is the number
of languages. All of the languages spoken in North and South America,
with the exception of Cherokee and some Canadian languages written in
the UCAS, are written in Latin. All of the languages spoken in Africa,
with the exception of a few languages written in Ethiopian and Arabic,
are written in Latin. All of the languages of Europe are written in
Latin, Greek or Cyrillic. All of the languages of Australia are
written in Latin. All of the languages of New Guinea (12% of the
world's languages) are written in Latin. Most of the languages of the
former USSR are written in Cyrillic.

I
believe a majority of the world's population has as their native
language a language that does not use case.

Just take India and China and you're already almost there. Now throw
in the rest of South Asia and East Asia, all of the Arabic speaking
countries, ....


According to Wikipedia, Asia has 60% of the world's population. By my
estimates, the part of that population, including Vietnam, Indonesia
and Russia, that use casing scripts is larger than the number of
people outside that don't use casing scripts (mainly North Africa,
population-wise.) 60% may be a majority, but it's hardly a huge
majority.

No, you only have to deal with the idiosyncracies of the subset you
support. A good multilingual application will have sufficient support
for acceptable display and editing of most or all languages, but
there's no reason it should have lots of language-specific features
for each language. Why should all apps be forced to have
(Euro-centric) case mappings, but not also mappings between (for
example) the corresponding base-character and subjoined-character
forms of Tibetan letters, or transliteration mappings between Latin
and Cyrillic for East European languages?


Two issues:

Demand: 40% of the world uses cased scripts, including most of the
richest part of the world. (Compare to the .1% that use Tibetan.)
Furthermore, casing is a very low-level operation; uppercase,
lowercase and titlecase words are mixed freely with an understanding
of the fundamental identity. Market-share aside, I don't believe
writers of Eastern European languages frequently mix Latin and
Cyrillic in the same document.

History: I'm not aware of a computer model in the history of the world
that supported text and not Latin text. If there are such, I would be
stunned if they amounted to one in a million of all computers made.
Virtually all computers in use depend on ASCII, with the remainder
depending on ASCII variants or EBCDIC variants that are equally Latin
dependent. Latin text is fundamental to computers, and the casing
operation is a part of many standards and commonly used APIs. Call it
language imperalism, but it's reality.

But
each app is free to choose which language-specific frills it wants to
include support for.


And I suspect that most of them will choose to support the
language-specific frills that 40% of the world's population demand. In
fact, I don't know of a single language-specific "frill" that has as
much demand as casing; the non-casing scripts are a pretty diverse
bunch that the majority share no "frill" as key to them as casing is
Cyrillic, Latin and Greek.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: perl unicode support

Reply via email to