RE: regexp a definovani vlastni tridy znaku

Podlesak Kamil Thu, 06 Dec 2007 02:48:42 -0800

> Skvele. To je presne to, co potrebuju. Kde se clovek takovou 
> vec dozvi? 
> V Javadoc (v 6) je jen \p{Lu}.


To je jen priklad. Trochu nize je to podrobneji zdokumentovano:

Unicode blocks and categories are written with the \p and \P constructs as in 
Perl. \p{prop} matches if the input has the property prop, while \P{prop} does 
not match if the input has that property. Blocks are specified with the prefix 
In, as in InMongolian. Categories may be specified with the optional prefix Is: 
Both \p{L} and \p{IsL} denote the category of Unicode letters. Blocks and 
categories can be used both inside and outside of a character class.

Unicode bloky a kategorie lze najit ve standardu unicode. Jiz pripraveny seznam 
je na:

http://www.regular-expressions.info/unicode.html

> Jeste mam jeden dotaz:  kdyz chci case insensitive matching, tak
> 
> (?iu)(\\d+)([a-záčďéěíňóřšťůúýž]+)
> funguje podle predstav (matchne "3KrÁt")
> 
> zatimco
> 
> (?iu)(\\d+)([\\p{Ll}]+)
> hleda case-sensitive (nematchne "3KrÁt", ale matchne "3krát")
> 
> Proc?

Protoze \p{Ll} je explicitne lowercase a pouze lowercase.
Nejak nevidim vyznam proc delat case-insensitive match na lowercase... nebude 
lepsi  pouzit proste \p{L}?

> (Vim, ze (?iu)(\\d+)([\\p{Ll}\\p{Lu}]+) funguje, ale to neni 
> case-insensitive search) 

> kolisko

RE: regexp a definovani vlastni tridy znaku

Odpovedet emailem