On Thu, Sep 24, 2020 at 11:30:35AM +0200, Boudewijn Dijkstra wrote:
> Op Thu, 24 Sep 2020 02:56:51 +0200 schreef Andrew Hewus Fresh
> <and...@afresh1.com>:
> > On Wed, Sep 23, 2020 at 09:11:44AM +0200, Boudewijn Dijkstra wrote:
> > > Op Thu, 10 Sep 2020 04:01:30 +0200 schreef Bambero <bamb...@gmail.com>:
> > > > Hi,
> > > >
> > > > It seems that perl regular expressions lost one polish letter (ą):
> > > > https://www.compart.com/en/unicode/U+0105
> > > >
> > > > I can see this problem only under OpenBSD 6.7 with php-7.4 (same >
> > > version of php under linux is OK)
> > > >
> > > > Ex.:
> > > >
> > > > PHP 7.4.10 or 7.4.5
> > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas'));
> > > > int(1) // OK
> > > >
> > > > PHP 7.4.10 or 7.4.5
> > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas'));
> > > > int(0) // UPS???
> > > >
> > > > PHP 7.3.21
> > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas'));
> > > > int(1) // OK
> > > >
> > > > PHP 7.3.21
> > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas'));
> > > > int(1) // OK
> > > >
> > > > Any ideas how to fix that?
> > > >
> > > > Regards,
> > > > Bambero
> > > 
> > > The same happens with any UTF-8 sequence that ends in 0x85.  I guess
> > > (a part of) PHP's PCRE code is not in UTF-8 mode, causing triggers
> > > onCHAR_NEL (=0x85).
> > 
> > I don't know a lot about PHP or the external PCRE library, but my guess
> > would be that php is treating the string as bytes not characters.  Can
> > you try using the "u" (PCRE_UTF8) modifier?
> > 
> > https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
> 
> Indeed with "u" the expected 1 is returned! Now the question is, why is this
> needed on OpenBSD but not in Linux or Windows?

There are many unicode related changes in php 7.4, so I'm sure they
fixed something.
https://www.php.net/ChangeLog-7.php

I would guess that linux and windows both default to a UTF-8 locale,
while OpenBSD defaults to the C locale.

Does the out put from locale(1) provide you any hints?

Do you get any different results testing it with `LC_ALL=en_US.UTF-8`?

I don't know enough about php to know how it determines what locale to
use, so that may not have any effect, or you may need to adjust
something else.

l8rZ,
-- 
andrew - http://afresh1.com

Adding manpower to a late software project makes it later.

Reply via email to