Re: [Haskell-cafe] Re: Valid Haskell characters

Deborah Goldsmith Mon, 25 Aug 2008 20:30:51 -0700

No, the general category is not enough. Please read both references.As you can tell from DerivedCoreProperties.txt, for example:


# Derived Property: Uppercase
#  Generated from: Lu + Other_Uppercase


So general category Lu is not the same thing as "Uppercase"

Deborah

On Aug 25, 2008, at 7:18 PM, Maurí cio wrote:

On chapter 4 I see the following
nice table in page 139. Do you think
I can use it together with UnicodeData.txt
to choose valid characters for Haskell?
Here is the only place I found where names
match with haskell syntax reference
(uppercase, lowercase, punctuation, symbol).

Thanks,
Maurício

                      Table 4-7. General Category

Lu = Letter, uppercase
Ll = Letter, lowercase
Lt = Letter, titlecase
Lm = Letter, modifier
Lo = Letter, other
Mn = Mark, nonspacing
Mc = Mark, spacing combining
Me = Mark, enclosing
Nd = Number, decimal digit
Nl = Number, letter
No = Number, other
Pc = Punctuation, connector
Pd = Punctuation, dash
Ps = Punctuation, open
Pe = Punctuation, close

Pi = Punctuation, initial quote (may behave like Ps or Pe dependingon usage)Pf = Punctuation, final quote (may behave like Ps or Pe depending onusage)

Po = Punctuation, other
Sm = Symbol, math
Sc = Symbol, currency
Sk = Symbol, modifier
So = Symbol, other
Zs = Separator, space
Zl = Separator, line
Zp = Separator, paragraph
Cc = Other, control
Cf = Other, format
Cs = Other, surrogate
Co = Other, private use
Cn = Other, not assigned (including noncharacters)




Deborah Goldsmith a écrit :

You can't determine Unicode character properties by analyzing thenames of the characters.

Read chapter 4 of the standard:
http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf
and get the property values here:
http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt

It sounds like the properties you want are "Case" and "GeneralCategory". Maybe the spec should be more explicit on exactly howthe definitions map onto Unicode properties, so there is noambiguity.

Deborah
On Aug 25, 2008, at 6:15 PM, Maurí cio wrote:

Hi,

In Haskell reference, I see the
following definitions:

uniWhite -> any Unicode character defined
as whitespace;

uniSmall -> any Unicode lowercase letter;

uniLarge -> any uppercase or titlecase
Unicode letter;

uniSymbol -> any Unicode symbol or
punctuation.

Where do I get lists for those
characters? My first attempt was to
check:

http://unicode.org/Public/UNIDATA/UnicodeData.txt

and consider large anything marked as
CAPITAL and small anything marked as SMALL. I
didn't know what to guess about the symbols.
Am I using the right reference? How can I
recognize (or get a list of) valid uppercase and
lowercase unicode letters, as well as symbols
and punctuation?

Thanks for your help,
Maurício

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Valid Haskell characters

Reply via email to