RE: Software support costs (was: Nicest UTF

2004-12-11 Thread Carl W. Brown
Philippe, However, within the program itself UTF-8 presents a problem when looking for specific data in memory buffers. It is nasty, time consuming and error prone. Mapping UTF-16 to code points is a snap as long as you do not have a lot of surrogates. If you do then probably UTF-32 should be

RE: When to validate?

2004-12-10 Thread Carl W. Brown
Jill, I think that the best practice is to validate input. Besides the overhead of revalidating there is the issue of what do you do with data that contains invalid characters. This has to be handles explicitly. Once validated all transforms should maintain valid data. If you also provide a

Software support costs (was: Nicest UTF

2004-12-10 Thread Carl W. Brown
Philippe, Also a broken opening tag for HTML/XML documents In addition to not having endian problems UTF-8 is also useful when tracing intersystem communications data because XML and other tags are usually in the ASCII subset of UTF-8 and stand out making it easier to find the specific data you

text-transform (was: CSS3, Unicode BIDI, and Vertical Text Layout

2004-10-20 Thread Carl W. Brown
What amazes me is that no one has addressed numeric input. Often companies to simplify i18n use web servers and browsers for data processing. Much of that involves forms and these forms have mixed alphanumeric and numeric only fields. To the best of my knowledge nowhere can I specify numeric

RE: Common Locale Data Repository 1.1 beta

2004-05-17 Thread Carl W. Brown
Mark, I am impressed with the data collected but have problems with the structure and some of the actual data values. For example if I want to handle date/time data I need time zone info. I may also need country information to parse and format the date as well and language info for things

Special Casing (Was: Writing Tatar using the Latin script; new characters to encode?

2004-05-11 Thread Carl W. Brown
Eric, 1. Does somebody have more information about that effort? Eki lists four characters as needed but missing in Unicode (see http://www.eki.ee/letter/chardata.cgi?lang=tt+Tatarscript=latin). I had suggested earlier that Tartar be added to the special case rules for dotted and dotless I

RE: TR35

2004-05-11 Thread Carl W. Brown
Doug, The issue of French as spoken in Switzerland versus French as spoken in Canada is totally unrelated to the issue of Swiss conventions versus Canadian conventions for sorting, date and time format, decimal separator, and so forth. As for time zones, I agree completely with Mark that

RE: TR35

2004-05-11 Thread Carl W. Brown
Peter, If I live in Guam I will probably be using an en_US locale. However the US territory does not contain my time zone. Probably the best solution for this problem is to add a category of possessions to the territory information. This allows applications to enumerate available time zones

RE: TR35 (was: Standardize TimeZone ID

2004-05-08 Thread Carl W. Brown
Mark, Do you know if there is an official list of country possessions? Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Mark Davis Sent: Friday, May 07, 2004 5:28 PM To: Carl W. Brown; Unicode List Subject: Re: TR35 (was: Standardize TimeZone ID

TR35 (was: Standardize TimeZone ID

2004-05-07 Thread Carl W. Brown
Mark, LDML does require the Olson IDs to identify time zones (as does Unix, Java, ICU,...). See the discussion in http://www.unicode.org/reports/tr35/. I found a normalization problem with the IDs. For example you have both Asia/Istanbul and Europe/Istanbul which are different names for

RE: TR35 (was: Standardize TimeZone ID

2004-05-07 Thread Carl W. Brown
Mark, That is not a problem. The Olson IDs are not guaranteed to be unique, just unambiguous. And there are aliases. Typically these are de-unified for political purposes. Thus you may find that two different IDs produce the same results over the entire period of time in the database.

RE: Newbie questions: 1) Surrogates in WinXP? 2) Unicode in PostS cript?

2004-04-10 Thread Carl W. Brown
Markus, Rick Cameron wrote: IMHO, that's a bit misleading. The String class itself does not appear to be aware of SMP characters. It clearly uses UTF-16, and the length it reports is the number of code units, not the number of characters or graphemes in the string. There is no

RE: Newbie questions: 1) Surrogates in WinXP? 2) Unicode in PostScript?

2004-04-05 Thread Carl W. Brown
Benjamin, Versions up until Windows 2000 use UCS-2 internally. 2000 and XP use UTF-16, although applications tend to have differing levels of awareness about surrogates. You can enable Win2K surrogate support http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicod

RE: [OT] proscribed words... (was:What is the principle?)

2004-03-29 Thread Carl W. Brown
Philippe, its set of proscribed words including in programs that were designed to filter the words out of text. Dos this list really exist? Seriously, there's no word that can be proscribed, because they are not themselves infamous. What is infmous or dangrour is their use to make

RE: What is the principle?

2004-03-28 Thread Carl W. Brown
James Kass, U+E000 COMBINING BLACK BLOB? Censors would probably love it. It is a much more universal solution than the one that the censors really wanted. COMBINING EXPLITIVE DELETE The character would be inserted after all words and delete them if they were on a proscribed list of forbidden

RE: Irish dotless I (was: Languages with letters that always take diacriticals

2004-03-17 Thread Carl W. Brown
Marion, That particular campaign was such a resounding 'success' we went on to spend thousands of quid each year, for many years, trekking one more encoding campaign trail after another, in support of many other languages, as well as our own. It reminds me of my work on a multi-lingual

Irish dotless I (was: Languages with letters that always take diacriticals

2004-03-16 Thread Carl W. Brown
Marion, Irish in Roman script is written i with dot above, Irish in traditional script is written i without dot above. The current flooding of our local advertising and publishing markets by various non-native uncial fonts to write our language goes against tradition in imposing on us that

RE: unicode format

2004-02-23 Thread Carl W. Brown
Mark, Markus did a good job of describing that advantages of each. The problem that I see is that there are applications that are not enabled to do BOM processing and convert from little-endian to big-endian and the other way around. Are there any browsers that support Unicode but will not do

[OT] Euro-English (was: Corea? (Re: Swastika to be banned by Microsoft?)

2003-12-15 Thread Carl W. Brown
Euro-English The EU announces changes to the spellings of common English words... European Union commissioners have announced that agreement has been reached to adopt English as the preferred language for European communications, rather than German, which was the other possibility. As part of

RE: Case mapping of dotless lowercase letters

2003-12-15 Thread Carl W. Brown
Jill, The dotted and dotless i are distinctly different, however I like to fold them when doing searches because I don't know of any cases where is would case search problems. However if I am searching for Istanbul and what to include the dotted spelling as well. Carl -Original

RE: Fonts on Web Pages

2003-12-02 Thread Carl W. Brown
in each page: !-- /* $WEFT -- Created by: Carl W. Brown ([EMAIL PROTECTED]) on 2/17/2002 -- */ @font-face { font-family: Papyrus; font-style: normal; font-weight: normal; src: url(PAPYRUS3.eot); } -- Carl -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
Jill, I know that Unicode does have some locale-sensitive case mappings (Turkish uppercase I to dotless lowercase I for example), I was under the impression that ss to ß was not one of them. You are correct that SS and ß are the same in case insensitive compares regardless of locale. I

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
Mark, But there's no official Unicode standard that I know of (and that isn't saying much) that says that ss and ß have to compare as equals. http://www.unicode.org/Public/UNIDATA/CaseFolding.txt Carl

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
Doug, You might remember that I chided Microsoft for its definition of Unicode in Windows 2000 Help, where Unicode was described as a 16-bit standard that was developed between 1988 and 1991, implying that the work was finished. Even at the time Windows 2000 was being developed, there

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
. Brown Cc:[EMAIL PROTECTED] Subject: RE: MS Windows and Unicode 4.0 ? Carl W. Brown wrote: Doug writes: You might remember that I chided Microsoft for its definition of Unicode in Windows 2000 Help, where Unicode was described as a 16-bit standard that was developed between

RE: Hexadecimal digits?

2003-11-11 Thread Carl W. Brown
Michael, This is another strawman argument isn't it? Nobody on this thread has said they want monospaced alphanumerics. No, but the responsibles have responded and informed the list that clones of Latin letters A-F will not be entertained. How 'bout we drop the discussion? Before dropping

RE: W3C Objects To Royalties On ISO Country Codes

2003-09-21 Thread Carl W. Brown
Michael, Tim Berners-Lee has sent a letter of concern to the president of ISO about the idea of collecting royalties on...guess what... ISO language and country codes! According to the letter, the ISO Commercial Policies Steering Group is proposing a royalty on commercial use of ISO

[Way OT] God punishes people who go too OT (was: Beer measurements

2003-08-21 Thread Carl W. Brown
Mark, Right after Ken was so nice to take it to the beer OT topic to a group message off line, I got hit with sobig-f. Over 1000 messages per day and I did not open an attachment. I know that part of what makes us i18n folks is detail. But too often we carry it too far even with topics

RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Carl W. Brown
Mark, Yes, I am sick and tired of dealing with this horrible non-decimal measurement system the US has for time: the number of units per other unit vary all across the board: 60..61 : 1, 60 : 1, 24 : 1, 28..31 : 1, 12 : 1, 365..366 : 1 -- awful. At least with inches, feet, and miles,

RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Carl W. Brown
John, A kilosec is a reasonable amount of time to wait for a late appointment (in some countries, anyhow). A megasec is enough time to do a small project. If a marriage lasts a gigasec, it is doing very well. 1 pictun = 20 baktun = 2,880,000 days = approx. 7885 years 1 calabtun = 20

RE: AL32UTF8 Vs UTF8

2003-08-14 Thread Carl W. Brown
Jay, Oracle's UTF-8 is not really a valid encoding. It encodes surrogates as if they were characters. The kept the old Unicode 2.x code that only supports BMP to provide sort key compatibility for clients who never upgraded to Unicode 3.0 support and are using 16 bit character encoding

RE: [OT?] LCD/LED Keyboard

2003-07-25 Thread Carl W. Brown
Thomas, It's all well and good to change the keyboard layout, but it can be confusing if it becomes too different from the physical keyboard (esp. if one has to type something in a totally different alphabet). Now, if anybody would manifacture keyboards with tiny LCD displays on each key,

RE: 24th Unicode Conference - Atlanta, GA - September 3-5, 2003

2003-07-10 Thread Carl W. Brown
Tim, The point is not that any potential attendee would actually travel to the wrong place. It is that advertising the 24th conference as Atlanta, GA but the 23rd as Prague, Czech Republic is part of a cultural arrogance in the USA. We should have the next conferences in San Jose, Costa Rica

RE: [OT] No more IE for Mac

2003-06-15 Thread Carl W. Brown
I disagree with Philippe's message in that I think that it is based on Microsoft's determination to follow the idea that browsers are not applications but part of the OS. To clarify my statement. I think Philippe's message was appropriate to this forum. It was far more pertinent to Unicode

RE: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Carl W. Brown
to cover. Even if the user does not read the language they may be able to recognize the name. From one of my sites: !-- /* $WEFT -- Created by: Carl W. Brown ([EMAIL PROTECTED]) on 2/17/2002 -- */ @font-face { font-family: Papyrus; font-style: normal; font-weight: normal; src: url

RE: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Carl W. Brown
Chris, I think that if you have a Klingon web site that uses UTF-8 and the PUA with your own font is very Unicode savvy. Carl It's certainly a lot more savvy than using Latin-1 characters to encode Klingon. If nothing else we need to discourage people from using the Latin-1 code page

RE: Not snazzy (was: New Unicode Savvy Logo)

2003-05-30 Thread Carl W. Brown
Philippe, From: Carl W. Brown [EMAIL PROTECTED] It looks to me like UNCODE. Has the UN has taken a rode in globalization? Maybe the web page has no scripting but is still savvy. Wrong! You strip the very visible dot from the i letter, you also refse to see that there's a ligature

RE: Not snazzy (was: New Unicode Savvy Logo)

2003-05-29 Thread Carl W. Brown
Marco, No, archaic, American and informal are usage labels, not translations. The translation is buon senso. (BTW, it is: Dizionario Garzanti di inglese, Garzanti Editore, 1997, ISBN 88-11-10212-X) Webster's has to know, to understand or common sense, understanding. In actually it is

RE: UTF-24

2003-04-04 Thread Carl W. Brown
Doug, Most likely because no modern computer uses a 3-byte (24-bit) internal processing unit, and because it would be false economy for real-world Unicode text (see (1) and (2) above). What would be worse is to have an implementation like the old IBM 360 computers where the 24 bit addresses

RE: Finding a font that contains a particular character

2003-02-17 Thread Carl W. Brown
Alan, IE uses mlang to determine if you have the right fonts for the characters. http://msdn.microsoft.com/library/default.asp?url=/workshop/misc/mlang/overv iew/overview.asp Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Alan Wood Sent:

RE: Converting EBCDIC to Unicode

2003-02-12 Thread Carl W. Brown
Markus, There are some more characters that have the same codes in most EBCDIC codepages, but there are also some where the Latin letters are not all present. (I think some old Japanese EBCDIC codepages replace small Latin letters with Katakana ones.) That is true. The half width

RE: urban legends just won't go away!

2003-01-30 Thread Carl W. Brown
Barry, If you think that this is bad try 390 mainframe EBCDIC shift to upper case. You can shift up to 256 characters at a time with a single machine language instruction by ORing a line of spaces to your character field. Now that is bit flipping and is still heavily used. Carl -Original

RE: Precomposed Ethiopic (Was: Precomposed Tibetan)

2002-12-18 Thread Carl W. Brown
Marco, I agree. I did some basic design work on an Ethiopian system and it was decided to follow the same implementation system as Thai. We don't encode every possible Thai glyph. We felt that if it were ever Unicode encoded we needed to use the decomposed characters rather than decomposing

RE: Precomposed Tibetan

2002-12-17 Thread Carl W. Brown
Marco, I was disappointed that Unicode used precomposed encoding for Ethiopic. Carl

RE: Precomposed Tibetan

2002-12-17 Thread Carl W. Brown
Michael, I was disappointed that Unicode used precomposed encoding for Ethiopic. Heavens, why? I assume that you are being tongue-in-cheek. If not: Since you key in syllables as consonant+vowel combinations you can keep the encoding under 256 characters like most other languages with

RE: Morse coded Unicode(was: Morse code

2002-11-21 Thread Carl W. Brown
translating between different language that represent different cultures. Carl -Original Message- From: Stefan Persson [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 20, 2002 2:33 PM To: Carl W. Brown; [EMAIL PROTECTED] Subject: Re: Morse coded Unicode(was: Morse code - Original

Morse coded Unicode(was: Morse code

2002-11-20 Thread Carl W. Brown
Tex, I think that the bigger issue might be how do you extend Morse code to incorporate the Unicode character set. Other than an enormous number do dots and dashes per character there are other issues. Without case do you need a German sharp s? Does the final sigma need to forms? How do you

RE: Morse code

2002-11-18 Thread Carl W. Brown
Radovan, I seem to remember that just recently Morse code was dropped and is no longer used officially. Braille is different. Unicode does support dead scripts for scholarly use. Do you think that there will be many scholarly texts that will be written in Morse code? Carl -Original

RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-15 Thread Carl W. Brown
Doug, However, 16 bit characters were a hard enough sell in the good old days. If we had started out withug 2bit characters we would still be dreaming about Unicode. I think Carl meant with 32-bit characters. I don't know what kind of word withug is (Old English?), but I like it. It

RE: IBM AIX 5 and GB18030

2002-11-14 Thread Carl W. Brown
Jane, One of the problems is that early Unicode adopters used the 16 bit UCS-2 encoding for of Unicode. Converting to UTF-16 requires surrogate support. Some of the GB18030 characters require this support. ICU is dedicated to Unicode support so a lot of effort is put into ICU to keep it up to

RE: IBM AIX 5 and GB18030

2002-11-14 Thread Carl W. Brown
] [mailto:unicode-bounce;unicode.org]On Behalf Of Markus Scherer Sent: Thursday, November 14, 2002 9:18 AM To: unicode Subject: Re: IBM AIX 5 and GB18030 Carl W. Brown wrote: Some Unix systems adapted faster because the later Unicode adopters used 32 bit Unicode characters making the job

UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-14 Thread Carl W. Brown
Markus, You seem to suggest that there is a problem with 16-bit Unicode. It does take some effort to adapt UCS-2-designed functions for UTF-16, but it's not rocket science and works very well thanks to the Unicode allocation practice (common characters in the BMP). Making UTF-8/32 functions

RE: Lunate, Terminal, and Medial Sigma

2002-11-10 Thread Carl W. Brown
 Jim, There already is a Unicode solution for the problem. Check UAX #21. If search engines use case insensitive compares then it should be no problem. There a a lot of exceptions to the rule so that you need separate characters for the forms but you also need an algorithm that works

RE: ct, fj and blackletter ligatures

2002-11-02 Thread Carl W. Brown
Thomas, It seems that the private use area is abused. If you are sending characters between two systems that are not a part of the Unicode standard then you can use the private use area with agreed code points. With ligatures you scan the text and identify ligature pairs. The resultant text is

CSS IME control(was: Last call WDs: css3-text, css3-ruby

2002-10-26 Thread Carl W. Brown
Mark, Do you know if there is any CSS work on defining field contents. I have run into a number of cases where I wanted to distinguish between text and numeric only input fields. The numeric field entry would disable the IME so that the user could enter standard Latin narrow digits. With

Java Unicode support

2002-10-25 Thread Carl W. Brown
What level of Unicode does Java currently fully support? Carl

RE: Revised proposal for Missing character glyph

2002-08-26 Thread Carl W. Brown
William, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of William Overington Sent: Friday, August 23, 2002 12:55 AM To: James Kass; Carl W. Brown; Unicode List Cc: [EMAIL PROTECTED] Subject: Re: Revised proposal for Missing character glyph

RE: Revised proposal for Missing character glyph

2002-08-26 Thread Carl W. Brown
Ken, The little square boxes do not help much if you what to know exactly what the missing characters are. I do however feel that any solution to the problems should be Unicode based. If left to the vendors that may display the code page characters and you are guessing again. The tool idea is

RE: Revised proposal for Missing character glyph

2002-08-19 Thread Carl W. Brown
Ken, This is an alternate to representing bad glyphs with a missing glyph character. People can implement either. -Original Message- From: Kenneth Whistler [mailto:[EMAIL PROTECTED]] Sent: Friday, August 16, 2002 2:28 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL

Revised proposal for Missing character glyph

2002-08-16 Thread Carl W. Brown
Proposed unknown and missing character representation. This would be an alternate to method currently described in 5.3. The missing or unknown character would be represented as a series of vertical hex digit pairs for each byte of the character. BMP characters would be represented with 4 hex

Proposal (was: Missing character glyph)

2002-08-04 Thread Carl W. Brown
of some Latin text. It would be higher that wide but not as high as the 6 hex digit grouping. Carl W. Brown

RE: Proposal (was: Missing character glyph)

2002-08-04 Thread Carl W. Brown
With a bit more thought we might reduce the minimum point size of an unrenderable character as follows: The numbers represent a dot position of that bit is a one. It is blank if the bit is 0. The XX characters are lines with an inverted wide squared U at the top with the edges coming down to

RE: Unicode and Security

2002-02-10 Thread Carl W. Brown
Doug, I agree. I used to do security consulting and found that the biggest problem was that people tried to come up with solutions for the wrong problem. We can go back to the typewriter days when there was no.t difference between 1 l or 0 O. Do. you blame ASCII if you type ST0P instead of

xIUA 3.2 is available

2001-12-06 Thread Carl W. Brown
xIUA 3.2 with ICU 2.0 support is available from X.Net, Inc. It is also compatible with ICU 1.8.1. http://www.xnetinc.com/xiua/ Upgrade instructions for prior releases are available from X.Net, Inc. Carl

RE: CP1256 and Persian YEH?

2001-10-12 Thread Carl W. Brown
Roozbeh, I was told that there was a special (semi official) version of Win98 that added 4 missing letters in CP1256 by replacing Latin letters to create CP1256mod. It used LCID 0826. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Roozbeh

RE: CP1256 and Persian YEH?

2001-10-12 Thread Carl W. Brown
patches but I don't think that the MS folks ever did an Urdu patch. Carl -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, October 12, 2001 9:01 AM To: Carl W. Brown; [EMAIL PROTECTED] Subject: Re: CP1256 and Persian YEH? Probably mistaken

RE: Roadmaps

2001-10-10 Thread Carl W. Brown
Keld Simonsen, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Keld Jorn Simonsen Sent: Wednesday, October 10, 2001 1:07 PM To: Michael Everson Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Roadmaps On Wed, Oct 10, 2001 at 07:53:51PM

RE: Roadmaps

2001-10-10 Thread Carl W. Brown
Keld, In the case of ISO 639 there is an online, official, up-to-date registry available at the Library of Congress site. It is there because the same codes are used in the MARC standard. However even though they seem to keep it up to date, it is an unofficial copy of the standard. Other

RE: Unicode locale id

2001-10-04 Thread Carl W. Brown
Bent Herlevsen, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Magda Danish (Unicode) Sent: Thursday, October 04, 2001 10:00 AM To: [EMAIL PROTECTED] Subject: FW: Unicode locale id -Original Message- From: [EMAIL PROTECTED]

RE: surrogate at java's property file

2001-10-04 Thread Carl W. Brown
Addison, It might be easier to convert the JVM from UCS-2 to UTF-32 so that you do not have to worry about surrogates. This would more closely match most Unix implementations (except Sun) where Java is widely used. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL

RE: Deseret keyboard (was:Re: Special Type Sorts Tray 2001)

2001-10-03 Thread Carl W. Brown
Doug, I suspect that since it was a phonetic spelling system and the writings varied with the writer's pronunciation that individualized keyboard layouts could be a personal preference as well. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of

RE: Deseret keyboard (was:Re: Special Type Sorts Tray 2001)

2001-10-03 Thread Carl W. Brown
if users could create sig's to define the layout. Now all I need is the Klingon font, thanks to this thread I found the Deseret font. - Dave "Carl W. Brown" [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/03/0

RE: Special Type Sorts Tray 2001

2001-10-02 Thread Carl W. Brown
MichKa, And I am sure Apple is hard at work on the Desert font and keyboard for Mac OS 11? :-) Getting the scripts defined will allow third parties to add support to most operating systems for specific languages that are not supported by the standard offerings. The big deal will be

RE: Special Type Sorts Tray 2001 (derives from Egyptian Transliteration Characters)

2001-09-30 Thread Carl W. Brown
William, It looks like if you really want multilingual support that you need to run your text through a layout engine. If that is the case then you can remap certain characters or character combinations into the U+FDD0 to U+FDEF Unicode range and use this special non-character area for what

RE: Shape of the US Dollar Sign

2001-09-29 Thread Carl W. Brown
Michka, I have also heard that the dollar sign come from a U superimposed over an S and the bottom of the U was dropped. This would be hard to do on a typewriter because the two lines would be so close that they would be indistinct and would fill with lint from the ribbon. I suspect that the

RE: a joke- with no typos or end in sight

2001-09-26 Thread Carl W. Brown
Tex, ok i'll quit I figured that you would drag some GIFTS (Poison) from your MIST (Manure) ridden mind. Carl

RE: a joke- with no typos

2001-09-25 Thread Carl W. Brown
Tom, If i can b so bold as 2 pen a pun or 2. Punning is a vocabulary mind set that even a pica-mind can render. There is no bad joke like a good pun. It is a great way to lose friends and make enemies. The only really challenging puns are the multilingual ones. Now that I have been avoiding

RE: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Carl W. Brown
Mike, The typical situation involves cases where large data sets are cached in memory, for immediate access. Going to UTF-32 reduces the cache effectively by a factor of two, with no comparable increase in processing efficiency to balance out the extra cache misses. This is because

RE: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Carl W. Brown
Tom, Andy Heninger writes: Performance tuning is easier with UTF-16. You can optimize for BMP characters, knowing that surrogate pairs are sufficiently uncommon that it's OK for them take a bail-out slow path. Sure, but if you are using UTF-16 (or any other multibyte encoding) you

RE: Position of 1 and 0

2001-09-24 Thread Carl W. Brown
this response. At 12:49 -0500 2001-09-24, Eric Fischer wrote: Michael Everson [EMAIL PROTECTED] quotes Carl W. Brown: This is logical. Originally typewrites had no 1 or 0. You code use the letters l and O. They look the same so that is good enough until computers came along

RE: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Carl W. Brown
Mike, If you think you have the answer to all the problems, then you don't know all the problems. I tried to make a point, and apparently made it poorly. I will try again. It seems that some people are arguing that UTF-16 is the ideal solution for all computing, and that

RE: [lojban] (from lojban-beginners) pi'e

2001-09-22 Thread Carl W. Brown
Edward, Typewriters, computer keyboards, and school recitations still put 0 after 9 rather than before 1. Such is Human Stupidity. This is logical. Originally typewrites had no 1 or 0. You code use the letters l and O. They look the same so that is good enough until computers came along and

Developing UTF-8 support

2001-09-22 Thread Carl W. Brown
When developing xIUA, I designed UTF-8 support to be used two different ways. One as a form of Unicode and the other as yet another code page. In either case the two are handled with few exceptions in the same manor. The only difference it when you want to convert from UTF-8 to an underlying

RE: 3rd-party cross-platform UTF-8 support

2001-09-20 Thread Carl W. Brown
Ken I have to convert from UTF-8 to UTF-16, before calling ICU functions (such as ucol_strcoll() ) I'm worried about the performance overhead of this conversion. You shouldn't be. The conversion from UTF-8 to UTF-16 and back is algorithmic and very fast. To make this conversion

RE: discontent about Indic scripts and Unicode

2001-09-19 Thread Carl W. Brown
Ram, If ISCII is intended as a pan-Indic solution does it also support Urdu? Carl

RE: discontent about Indic scripts and Unicode

2001-09-19 Thread Carl W. Brown
Ram, ISCII has escape sequences which announce the start of a new Indic script. An ATR char followed by special codepoint forms the escape sequence. It is possible to support a page that contains different Indic scripts.There are problems with the standard like, it assumes a default

RE: PDUTR #26 posted

2001-09-18 Thread Carl W. Brown
Doug, It is true that the *specific* irregular UTF-8 sequences introduced (and required) by CESU-8 decode to characters above 0x when interpreted as CESU-8, and to pairs of surrogate code points when (incorrectly) interpreted as UTF-8. Since definition D29, arguably my least favorite

RE: 6 questions

2001-09-18 Thread Carl W. Brown
Bernard, Many of your questions have been answered by others but I wants to add a few comments. 1.Why does Unicode say that there are 63486 code values available to represent characters with single 16 bit values and 2048 available to represent an additional 1,048,544 characters as

RE: discontent about Indic scripts and Unicode

2001-09-18 Thread Carl W. Brown
Ken, Even those who do not know the details of Indic processing know that you can not argue both sides of the issue. There was a lot of criticism of the fact that there were differences in scripts yet there was no mention that Unicode because of its extended code base does support

RE: discontent about Indic scripts and Unicode

2001-09-18 Thread Carl W. Brown
Ram, ISCII has escape sequences which announce the start of a new Indic script. An ATR char followed by special codepoint forms the escape sequence. It is possible to support a page that contains different Indic scripts.There are problems with the standard like, it assumes a default

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
MichKa, Actually, once its in IANA then it is legal in XML and other places, and *everyone* will have to support it, whether they want to or not. What is supposedly private will become quite public. IANA, after all, does not have charsets that they register for people to not use and none of

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
MichKa, Also, Toby was not attempting to be deceitful, AFAIK. The original proposal he submitted (still called UTF-8S) was not in any way contradictory but many people objected to various issues within it and the way many things were presented. The current proposal was a very rushed

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
Doug, But if people start compromising their UTF-8 parsers to accommodate CESU-8 adaptively, it would be a great blow to UTF-8. It would essentially undo all the tightening-up that was accomplished by the Corrigendum, and it would revive all the old Bruce Schneier-style skepticism about

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
Mark, - Just because it is in IANA does *not* mean that everyone will support it. There are many encodings in IANA supported by very few people. Nor does it mean that it is intended for widespread public use. The IANA registry is also used as a general purpose registry, even for encodings

RE: CESU-8: to document or not

2001-09-17 Thread Carl W. Brown
Addison, By providing a documented, standard way to refer to legacy versions of these products and their encodings, I can more readily rely on having a well-documented range of protocols and procedures for converting and validating data exchanged with these systems. The argument that

RE: CESU-8 vs UTF-8

2001-09-16 Thread Carl W. Brown
MichKa, Many people believe that any rule or law that makes no sense or cannot be enforced weakens all other laws. I believe that publishing an inconsistent document that would allow any reasonably intelligent reader to come to the same conclusions as you did, and the standard itself would

RE: CESU-8 vs UTF-8

2001-09-16 Thread Carl W. Brown
Marcin, We can't change the past, but I hope that at least UTF-8 processing can be done without treating surrogates in any special way. Surrogates are relevant only for UTF-16; by not using UTF-16 you should be free of surrogate issues, except by having a silly unused area in character

RE: CESU-8 vs UTF-8 (Was: PDUTR #26 posted

2001-09-15 Thread Carl W. Brown
different sort orders. Lets fix the problem the right way. Thank you, (Now stepping off the soap box) Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Carl W. Brown Sent: Friday, September 14, 2001 9:40 PM To: [EMAIL PROTECTED] Subject: CESU-8

RE: CESU-8 vs UTF-8

2001-09-15 Thread Carl W. Brown
Doug, This was my solution long ago: fix the code that sorts in UCS-2 order so that supplementary characters are sorted correctly. In case there is any disagreement about this, sorting by UCS-2 order has been WRONG ever since surrogates and UTF-16 were invented. However, the database

RE: Anti-UTF-16 Rant (was: Re: PDUTR #26 posted)

2001-09-14 Thread Carl W. Brown
Ken, I agree. Any one who was an original Unicode evangelist with the loose leaf Unicode 1.0 binder in hand knows that if it were not for UCS-2 that Unicode would not be used today. It was a risk for MS to use Unicode in NT It was a risk for MS to partially implement Unicode in Win95. It was

  1   2   3   >