Op Thu, 24 Sep 2020 02:56:51 +0200 schreef Andrew Hewus Fresh <and...@afresh1.com>:
On Wed, Sep 23, 2020 at 09:11:44AM +0200, Boudewijn Dijkstra wrote:
Op Thu, 10 Sep 2020 04:01:30 +0200 schreef Bambero <bamb...@gmail.com>:
> Hi,
>
> It seems that perl regular expressions lost one polish letter (ą):
> https://www.compart.com/en/unicode/U+0105
>
> I can see this problem only under OpenBSD 6.7 with php-7.4 (same > version of php under linux is OK)
>
> Ex.:
>
> PHP 7.4.10 or 7.4.5
> <?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas'));
> int(1) // OK
>
> PHP 7.4.10 or 7.4.5
> <?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas'));
> int(0) // UPS???
>
> PHP 7.3.21
> <?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas'));
> int(1) // OK
>
> PHP 7.3.21
> <?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas'));
> int(1) // OK
>
> Any ideas how to fix that?
>
> Regards,
> Bambero

The same happens with any UTF-8 sequence that ends in 0x85. I guess (a part of) PHP's PCRE code is not in UTF-8 mode, causing triggers on CHAR_NEL (=0x85).

I don't know a lot about PHP or the external PCRE library, but my guess
would be that php is treating the string as bytes not characters.  Can
you try using the "u" (PCRE_UTF8) modifier?

https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

Indeed with "u" the expected 1 is returned! Now the question is, why is this needed on OpenBSD but not in Linux or Windows?




--
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Reply via email to