Hi,
I have a few questions regarding unicode regular expressions.
1) I'm working on a regexp matcher and I'd like to know which properties
are never needed in a \p{...} item. Currently I have included the properties
listed below, but for efficiency reasons I'd like to trough out what isn't
Thank you for every nice guy helped me last time. I wasn't expecting so many help--:)
I believe there is a database in unicode.org briefly described the 'reason' of every
CJK-A and CJK-B ideograph, because every character submitter must have provided
basical information about the characters he
Zhang Weiwu wrote,
I believe there is a database in unicode.org briefly described the 'reason' of every
CJK-A and CJK-B ideograph, because every
character submitter must have provided basical information about the characters he
submitted (the meaning of the character or the
character it
At 20:18 +0430 2002-07-10, Roozbeh Pournader wrote:
BTW, which characters should be used to encode the dot and dash of
Morse in a typographically correct way?
On 2002.07.10, 17:06, Michael Everson [EMAIL PROTECTED] wrote:
Depends on your font. Middle dot and hyphen possibly.
It looks best
-BEGIN PGP SIGNED MESSAGE-
Mark Davis wrote:
A small correction to Ken's message:
The Unicode scalar value
definitionally excludes D800..DFFF, which are only code unit
values used in UTF-16, and which are not code points associated
with any well-formed UTF code
Arsa Lisa Moore:
Dear Marion,
After checking the mail lists upon returning from vacation/holiday, I found
the following comment on the most recent Unicode conference in Dublin
rather surprising...
I missed reading mail over the wkend myself, Lisa - I was away cavorting
in the provinces,
On 07/23/2002 07:14:31 AM Marion Gunn wrote:
leading edge Unicode implementations. I particularly enjoyed hearing
from
a British mobile phone company at the Dublin conference...
Most enjoyable, I'm sure. May we take it that, when Unicode nest visits
London/Berlin, people from
Marion Gunn scripsit:
I wish to invite useful suggestions on short-term IT projects from
people able to honour confidentiality agreements and understand the
technical aspects, to work for EGT on a paid or profit-sharing basis.
...and John Cowan replied:
Advertising on this list is
At 16:13 +0100 2002-07-23, Marion Gunn wrote:
I do not understand John Cowan's anger, and I do not take personally his
accusation of racial discrimination
Well, you should. Because they were, however much you want to pretend
that they were not. Basically you said It was wrong to have so many
At 13:14 +0100 2002-07-23, Marion Gunn wrote:
Unicode was a worthwhile project - just not worth the thousands per
year it cost EGT!
It was worth every single penny. I regret not one penny of the
thousands per year which were spent between February 1994 and
September 2001 on standardization
So far, the Unicode Standard has defined code points to be from the contiguous range
of 0..0x10.
Some definitions are fuzzy in the standard, with hopes of clarification in Unicode 4.0.
It is true that UTF-16 cannot encode d800 dc00, but it can encode d800 0061 dc00.
There are at least
1. Here is my take, if you are trying to slim down:
canonical combining class
This is really useful for matching. For example, if my source text is
NFD and I want to recognize whatever is canonically equivalent to
a-ring (with perhaps other accents), then I have to use something like
the
At 13:22 -0400 2002-07-23, John Cowan wrote:
In the context of an international conference, it is surely the organization
that one represents that is relevant. Again I raise the example of Mr.
Everson: did you expect him to wear a nametag saying Ireland/U.S.?
Would it have led to anything but
Following up on several responses on this thread.
Mark Davis said:
A small correction to Ken's message:
The Unicode scalar value
definitionally excludes D800..DFFF, which are only code unit
values used in UTF-16, and which are not code points associated
with any
Kenneth Whistler kenw at sybase dot com wrote:
UTF-16 does not allow the representation of an unpaired surrogate
0xD800 followed by another, coincidental unpaired surrogate 0xDC00.
(It maps the two to U+1.) Among the standard UTFs, only UTF-32
allows the two to be treated as unpaired
I typo'd:
I suggest that UAX #18 be revised to
state this unambiguously.
s/#18/#19/
-Doug
On Tue, 23 Jul 2002, K. Sperling wrote:
We are searching for character sets in bitmap format such like
5x7,..., 12x18,... in different languages (currently in
Arabic) for programming graphic LCD modules.
It is possible to get a contact from you for more
informations about this?
You
17 matches
Mail list logo