\p{} and \g{} in regexp

2002-07-23 Thread Theo Veenker
Hi, I have a few questions regarding unicode regular expressions. 1) I'm working on a regexp matcher and I'd like to know which properties are never needed in a \p{...} item. Currently I have included the properties listed below, but for efficiency reasons I'd like to trough out what isn't

a brife dictionary for CJK character

2002-07-23 Thread Zhang Weiwu
Thank you for every nice guy helped me last time. I wasn't expecting so many help--:) I believe there is a database in unicode.org briefly described the 'reason' of every CJK-A and CJK-B ideograph, because every character submitter must have provided basical information about the characters he

Re: a brife dictionary for CJK character

2002-07-23 Thread James Kass
Zhang Weiwu wrote, I believe there is a database in unicode.org briefly described the 'reason' of every CJK-A and CJK-B ideograph, because every character submitter must have provided basical information about the characters he submitted (the meaning of the character or the character it

Morse encoding (was: Re: Codes for codes for codes for...)

2002-07-23 Thread Anto'nio Martins-Tuva'lkin
At 20:18 +0430 2002-07-10, Roozbeh Pournader wrote: BTW, which characters should be used to encode the dot and dash of Morse in a typographically correct way? On 2002.07.10, 17:06, Michael Everson [EMAIL PROTECTED] wrote: Depends on your font. Middle dot and hyphen possibly. It looks best

Re: Abstract character?

2002-07-23 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Mark Davis wrote: A small correction to Ken's message: The Unicode scalar value definitionally excludes D800..DFFF, which are only code unit values used in UTF-16, and which are not code points associated with any well-formed UTF code

Re: Dublin Conference: Re: ISO/IEC 10646 versus Unicode

2002-07-23 Thread Marion Gunn
Arsa Lisa Moore: Dear Marion, After checking the mail lists upon returning from vacation/holiday, I found the following comment on the most recent Unicode conference in Dublin rather surprising... I missed reading mail over the wkend myself, Lisa - I was away cavorting in the provinces,

Re: Dublin Conference: Re: ISO/IEC 10646 versus Unicode

2002-07-23 Thread Peter_Constable
On 07/23/2002 07:14:31 AM Marion Gunn wrote: leading edge Unicode implementations. I particularly enjoyed hearing from a British mobile phone company at the Dublin conference... Most enjoyable, I'm sure. May we take it that, when Unicode nest visits London/Berlin, people from

Re: Dublin Conference: Re: ISO/IEC 10646 versus Unicode

2002-07-23 Thread Sarasvati
Marion Gunn scripsit: I wish to invite useful suggestions on short-term IT projects from people able to honour confidentiality agreements and understand the technical aspects, to work for EGT on a paid or profit-sharing basis. ...and John Cowan replied: Advertising on this list is

Re: Dublin Conference: Re: ISO/IEC 10646 versus Unicode

2002-07-23 Thread Michael Everson
At 16:13 +0100 2002-07-23, Marion Gunn wrote: I do not understand John Cowan's anger, and I do not take personally his accusation of racial discrimination Well, you should. Because they were, however much you want to pretend that they were not. Basically you said It was wrong to have so many

Re: Dublin Conference: Re: ISO/IEC 10646 versus Unicode

2002-07-23 Thread Michael Everson
At 13:14 +0100 2002-07-23, Marion Gunn wrote: Unicode was a worthwhile project - just not worth the thousands per year it cost EGT! It was worth every single penny. I regret not one penny of the thousands per year which were spent between February 1994 and September 2001 on standardization

Re: Abstract character?

2002-07-23 Thread Markus Scherer
So far, the Unicode Standard has defined code points to be from the contiguous range of 0..0x10. Some definitions are fuzzy in the standard, with hopes of clarification in Unicode 4.0. It is true that UTF-16 cannot encode d800 dc00, but it can encode d800 0061 dc00. There are at least

Re: \p{} and \g{} in regexp

2002-07-23 Thread Mark Davis
1. Here is my take, if you are trying to slim down: canonical combining class This is really useful for matching. For example, if my source text is NFD and I want to recognize whatever is canonically equivalent to a-ring (with perhaps other accents), then I have to use something like the

Re: Dublin Conference: Re: ISO/IEC 10646 versus Unicode

2002-07-23 Thread Michael Everson
At 13:22 -0400 2002-07-23, John Cowan wrote: In the context of an international conference, it is surely the organization that one represents that is relevant. Again I raise the example of Mr. Everson: did you expect him to wear a nametag saying Ireland/U.S.? Would it have led to anything but

Re: Abstract character?

2002-07-23 Thread Kenneth Whistler
Following up on several responses on this thread. Mark Davis said: A small correction to Ken's message: The Unicode scalar value definitionally excludes D800..DFFF, which are only code unit values used in UTF-16, and which are not code points associated with any

Re: Abstract character?

2002-07-23 Thread Doug Ewell
Kenneth Whistler kenw at sybase dot com wrote: UTF-16 does not allow the representation of an unpaired surrogate 0xD800 followed by another, coincidental unpaired surrogate 0xDC00. (It maps the two to U+1.) Among the standard UTFs, only UTF-32 allows the two to be treated as unpaired

Re: Abstract character?

2002-07-23 Thread Doug Ewell
I typo'd: I suggest that UAX #18 be revised to state this unambiguously. s/#18/#19/ -Doug

Re: Programming graphic LCD modules

2002-07-23 Thread Roozbeh Pournader
On Tue, 23 Jul 2002, K. Sperling wrote: We are searching for character sets in bitmap format such like 5x7,..., 12x18,... in different languages (currently in Arabic) for programming graphic LCD modules. It is possible to get a contact from you for more informations about this? You