On 3/27/07, Rich Felker <[EMAIL PROTECTED]> wrote:
On Tue, Mar 27, 2007 at 06:44:42PM -0500, David Starner wrote: > On 3/27/07, Rich Felker <[EMAIL PROTECTED]> wrote: This is one of the very few places where a computer should ever perform case mappings: in a powerful editor or word processor
Just about any program that deals with text is going to have a need to merge distinctions that the user considers irrelevant, which often includes case. I use grep -i, even when searching the output of my own programs sometimes. I could go back and check the case I used in the messages, but I'd rather let the tools do that.
> >The whole idea of case conversion in programming languages is > >digustingly euro-centric. The rest of the world doesn't have such a > >stupid thing as case... > > Really? Funny, I'm from North America, and we have a concept of case Same thing. North American civilization is all European-derived.
The civilization on North America, South America, Europe, Australia and Antartica is European-derived, but I find it horribly hard to dismiss something that's universal in five of the seven continents as "disgustingly euro-centric".
> here. 90% of the languages native to the continent are written in a > script that has a concept of case. Is that so? I don't think so. Rather, most of the languages native to the continent have no native writing system, or use a writing system that was long ago lost/extincted. Perhaps you should look up the meaning of the word native.. :)
I wrote precisely what I meant, and I stand by it as correct. Read the sentence I wrote. No language uses a writing system that was long ago lost; that's logically absurd. I don't believe the concept of native writing system is clear, nor do I believe it's useful. Arguably, the "native" writing system of Irish is Oghma and the "native" writing system of Greek is Linear B, but from a practical aspect, Irish uses the Latin script and Greek uses Greek and those are the realities that we need to be dealing with.
> In fact, I think you'd find that > most of the world's languages are written in scripts that have a > concept of case. This is a very dubious assertion. Technically it depends on how you measure "most" (language count vs speaker count... also the whole dialect vs language debate), but otherwise I think it's bogus.
The English meaning of "Most of the world's languages" is the number of languages. All of the languages spoken in North and South America, with the exception of Cherokee and some Canadian languages written in the UCAS, are written in Latin. All of the languages spoken in Africa, with the exception of a few languages written in Ethiopian and Arabic, are written in Latin. All of the languages of Europe are written in Latin, Greek or Cyrillic. All of the languages of Australia are written in Latin. All of the languages of New Guinea (12% of the world's languages) are written in Latin. Most of the languages of the former USSR are written in Cyrillic.
I believe a majority of the world's population has as their native language a language that does not use case. Just take India and China and you're already almost there. Now throw in the rest of South Asia and East Asia, all of the Arabic speaking countries, ....
According to Wikipedia, Asia has 60% of the world's population. By my estimates, the part of that population, including Vietnam, Indonesia and Russia, that use casing scripts is larger than the number of people outside that don't use casing scripts (mainly North Africa, population-wise.) 60% may be a majority, but it's hardly a huge majority.
No, you only have to deal with the idiosyncracies of the subset you support. A good multilingual application will have sufficient support for acceptable display and editing of most or all languages, but there's no reason it should have lots of language-specific features for each language. Why should all apps be forced to have (Euro-centric) case mappings, but not also mappings between (for example) the corresponding base-character and subjoined-character forms of Tibetan letters, or transliteration mappings between Latin and Cyrillic for East European languages?
Two issues: Demand: 40% of the world uses cased scripts, including most of the richest part of the world. (Compare to the .1% that use Tibetan.) Furthermore, casing is a very low-level operation; uppercase, lowercase and titlecase words are mixed freely with an understanding of the fundamental identity. Market-share aside, I don't believe writers of Eastern European languages frequently mix Latin and Cyrillic in the same document. History: I'm not aware of a computer model in the history of the world that supported text and not Latin text. If there are such, I would be stunned if they amounted to one in a million of all computers made. Virtually all computers in use depend on ASCII, with the remainder depending on ASCII variants or EBCDIC variants that are equally Latin dependent. Latin text is fundamental to computers, and the casing operation is a part of many standards and commonly used APIs. Call it language imperalism, but it's reality.
But each app is free to choose which language-specific frills it wants to include support for.
And I suspect that most of them will choose to support the language-specific frills that 40% of the world's population demand. In fact, I don't know of a single language-specific "frill" that has as much demand as casing; the non-casing scripts are a pretty diverse bunch that the majority share no "frill" as key to them as casing is Cyrillic, Latin and Greek. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
