--On 2002-05-04 10.26 -0700 Doug Ewell <[EMAIL PROTECTED]> wrote: > I couldn't remember UTC ever saying such a thing, so when Mark Davis > <[EMAIL PROTECTED]> wrote:
I said _at_least_before_version_2_. In 1995 I had a long discussion with Unicode Consortium when I worked at Bunyip Information System how to do normalization and case folding together. Their response was to covert to lower case, and our implementation of Whois++ named Digger also ended up in a paper which was presented at a Unicode Conference around 1995-1996. I also saw Mark only refering to case folding, not lower case perticularly, and that's why I talk about handwaving, historical artifacts etc etc. >> There are also a number of codepoints which are lowercase which >> doesn't have uppercase versions. > > Which ones? I can think of a character that looks uppercase but has no > lowercase form (U+04C0 CYRILLIC LETTER PALOCHKA). But such letters, > despite their appearance, are neither uppercase nor lowercase; they are > caseless, and immune to the effects of any casing operation. Quote from page 142 in Unicode version 3 book: "Also, because many characters are really caseless (most of the IPA block, for example), uppercasing a string does not mean that it will no longer contain any lowercase letters." I only quote the text. Yes, when reading it, one might think it should have been written "...that it will only contain uppercase letter." but it doesn't. >> Last, some codepoints (like the german sharp-s, �) turns to "SS" in >> uppercase, and my guess is (with my limited knowledge of German, >> only 2 years of studies) that one when comparing don't want that >> similarities. > > German speakers are forced to deal with that mapping every day. It is a > natural part of the language. I know. I just gave it as one example where I _thought_ people from Germany rather wanted lower case than upper case. I wait until I hear from someone from Germany saying they prefer mapping to upper case before I say something else. We talk about what is preferred. Not wether people are used to. >> And, personally, I rather see bq-asdqwe123 than BQ-ASDQWE the few >> times I hope I see a domain name used in protocols natively in its >> ACE encoding. > > No argument there. All-lowercase is widely recognized as being easier > to read than all-uppercase, primarily because of the greater variation > in letterforms. But again, there doesn't seem to be any evidence that > the Unicode Consortium has made any of the claimed statements about > "preferring" lowercase or about the mapping to lowercase being more > "consistent." Please dig up the relevant references, if possible. After some digging, I see the paper Philippe and I wrote was presented on A5 on Unicode Conference number 9, on september 5, 1996: Text Searching Across Multiple Character Sets in Unicode Philippe Boucher, Senior Programmer, Bunyip Information Systems, Montreal, Quebec, Canada I can not find the document though. paf
