No, the general category is not enough. Please read both references.
As you can tell from DerivedCoreProperties.txt, for example:
# Derived Property: Uppercase
# Generated from: Lu + Other_Uppercase
So general category Lu is not the same thing as "Uppercase"
Deborah
On Aug 25, 2008, at 7:18 PM, Maurí cio wrote:
On chapter 4 I see the following
nice table in page 139. Do you think
I can use it together with UnicodeData.txt
to choose valid characters for Haskell?
Here is the only place I found where names
match with haskell syntax reference
(uppercase, lowercase, punctuation, symbol).
Thanks,
Maurício
Table 4-7. General Category
Lu = Letter, uppercase
Ll = Letter, lowercase
Lt = Letter, titlecase
Lm = Letter, modifier
Lo = Letter, other
Mn = Mark, nonspacing
Mc = Mark, spacing combining
Me = Mark, enclosing
Nd = Number, decimal digit
Nl = Number, letter
No = Number, other
Pc = Punctuation, connector
Pd = Punctuation, dash
Ps = Punctuation, open
Pe = Punctuation, close
Pi = Punctuation, initial quote (may behave like Ps or Pe depending
on usage)
Pf = Punctuation, final quote (may behave like Ps or Pe depending on
usage)
Po = Punctuation, other
Sm = Symbol, math
Sc = Symbol, currency
Sk = Symbol, modifier
So = Symbol, other
Zs = Separator, space
Zl = Separator, line
Zp = Separator, paragraph
Cc = Other, control
Cf = Other, format
Cs = Other, surrogate
Co = Other, private use
Cn = Other, not assigned (including noncharacters)
Deborah Goldsmith a écrit :
You can't determine Unicode character properties by analyzing the
names of the characters.
Read chapter 4 of the standard:
http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf
and get the property values here:
http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
It sounds like the properties you want are "Case" and "General
Category". Maybe the spec should be more explicit on exactly how
the definitions map onto Unicode properties, so there is no
ambiguity.
Deborah
On Aug 25, 2008, at 6:15 PM, Maurí cio wrote:
Hi,
In Haskell reference, I see the
following definitions:
uniWhite -> any Unicode character defined
as whitespace;
uniSmall -> any Unicode lowercase letter;
uniLarge -> any uppercase or titlecase
Unicode letter;
uniSymbol -> any Unicode symbol or
punctuation.
Where do I get lists for those
characters? My first attempt was to
check:
http://unicode.org/Public/UNIDATA/UnicodeData.txt
and consider large anything marked as
CAPITAL and small anything marked as SMALL. I
didn't know what to guess about the symbols.
Am I using the right reference? How can I
recognize (or get a list of) valid uppercase and
lowercase unicode letters, as well as symbols
and punctuation?
Thanks for your help,
Maurício
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe