Re: [PHP-DEV] Re: PHP 5.2.1RC3 Released

Nuno Lopes Thu, 25 Jan 2007 11:15:53 -0800

I've been thinking about how to not force UTF-8 in PCRE for PHP 6, andit's not that simple. This is mainly due to preg_replace(), because itallows array() parameters that can contain mixed IS_UNICODE and IS_STRINGvalues. I hope you realize though, that in UTF-8 mode PCRE does not careabout POSIX locales, even in PHP 5.

I haven't though on that, but can't you simply reject mixing of unicode andbinary strings?

By the way, I think ICU regexp extension, when implemented, will let youmatch Portuguese characters in UTF-8 strings.

I wasn't aware of that API.. anyway it is probably slower than pcre+locales(because it uses unicode propertie table lookups)

Yes, UTF-8 covers many aspects but does it know about words, white
spaces (not sure if ws are always the same)  and other locale specific
issues?  generally, not only pcre. Maybe it is more something  for ICU
directly, as you said later in this thread.

That's not really a problem with pcre, as it supports unicode characterproperties. It isn't documented in phpdoc (don't look at me :P), but itlooks like:

\pL
where L is one of (from http://pcre.org/pcre.txt):
        L     Letter
        Ll    Lower case letter
        N     Number
        Nd    Decimal number
        Nl    Letter number
        No    Other number
        P     Punctuation
        Zs    Space separator
(...)

Nuno

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: PHP 5.2.1RC3 Released

Reply via email to