----- Original Message -----
From: "WJCarpenter" <[EMAIL PROTECTED]>
To: "AbiWord Mailing List" <[EMAIL PROTECTED]>
Sent: Friday, July 21, 2000 7:01 AM
Subject: Re: smart quote algorithm

| koh> We can probably use all characters defined as 'Punctuation' in
| koh> the Unicode standard. These are marked as 'Po', e.g.:
|
| koh> 0021;EXCLAMATION MARK;Po;0;ON;;;;;N;;;;;
|
| Too bad there is no UT_UCS_ispunct() and friends that we can
| transplant into Abi.  The idea of using the character description
| files from <http://www.unicode.org> has a pretty steep overhead,
| though that is really the only way to get it exactly right.

Would <URL: http://ustring.charabia.net/ > be of use?

"What can a Unicode library do for me?

Unicode stores characters on 16 (or 32) bit, which implies it can handle
european, chinese, hebrew, etc. characters. The character database gives
important informations on the unicode characters, and allows a complete handling
of case mapping (upper, lower, and title case transformations). Normalization
forms can decompose the characters into letters and marks (diacritics), and
recompose them. If your programs use multiple charsets, multiple languages, or
need informations on character properties (eg. to have the upper case of a
letter, of remove the diacritics from a string), then you probably need
Unicode."

--
Karl Ove Hufthammer




Reply via email to