On 03.05.2019 at 01:18, Björn Larsson wrote:

> Den 2019-04-11 kl. 15:41, skrev Christoph M. Becker:
>
>> On 02.04.2019 at 11:42, Nicolai Scheer wrote:
>>
>>> I'm currently in the process of migrating an old application from php
>>> 5.6
>>> to 7.2.
>>> In the process, I fiddled with the default_charset ini setting.
>>>
>>> The documentation states (c.f.
>>> https://www.php.net/manual/en/ini.core.php#ini.default-charset):
>>>
>>> "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of
>>> default_charset
>>> will also be used to set the default character set for [...] and for
>>> mbstring functions
>>> if the mbstring.http_input mbstring.http_output
>>> mbstring.internal_encoding
>>> configuration option is unset."
>>>
>>> As such, I'd expect to be able to set default_charset to iso-8859-1 and
>>> mbstring to pick that same setting for its internal encoding (if the
>>> mentioned directives are unset, that is).
>>>
>>> This seems not to be the case:
>>>
>>> <?php
>>> ini_set( 'default_charset', 'iso-8859-1' );
>>> var_dump( ini_get("mbstring.internal_encoding") );
>>> var_dump( ini_get("mbstring.http_input") );
>>> var_dump( ini_get("mbstring.http_output") );
>>> echo mb_internal_encoding() . "\n";
>>> echo mb_strlen( "\xc3\xb6" ) . "\n";
>>> echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n";
>>>
>>> This outputs (7.2.15 on a CentOS box):
>>> string(0) ""
>>> string(0) ""
>>> string(0) ""
>>> UTF-8
>>> 1
>>> 2
>>>
>>> The default_charset is set but mbstring settings are not, so I'd
>>> expect to
>>> get 2 as the character/byte count in both cases.
>>>
>>> If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string
>>> lengths are equal.
>>>
>>> Since the mentioned mbstring directives are deprecated as of 5.6.0 -
>>> do I
>>> really need to use mb_internal_encoding() instead?
>>> Is the documentation wrong or am I just misinterpreting it? I thought
>>> that
>>> default_charset should act as some kind of "master setting" in order
>>> not to
>>> have to set all specific settings as well (e.g. iconv, mbstring).
>>>
>>> Usually we use UTF-8, so I did not come across this before...
>>>
>>> Any insight?
>>
>> <https://3v4l.org/ZvQ67> confirms the reported behavior.  A quick look
>> at the code, too.  I suggest you file a ticket on
>> <https://bugs.php.net/>.
>
> Did this lead to a bug report?

Hmm, apparently not.

> It lead to a bug in Smarty 3.1.33 for me. I got a warning about
> "mbregex compile err: invalid code point value" in mb_split().
> I have content in ISO-8859-1 and Smarty normal procedure to
> set encoding and php.ini setting to ISO-8859-1 flunked.
>
> However mb_regex_encoding('ISO-8859-1') did the trick!

While the RFC[1] states

| all functions that take encoding option use php.internal_encoding as
| default (e.g. htmlentities/mb_strlen/mb_regex/etc)

apparently this has not been implemented (yet).

[1] <https://wiki.php.net/rfc/default_encoding>

--
Christoph M. Becker

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to