On 03.05.2019 at 01:18, Björn Larsson wrote: > Den 2019-04-11 kl. 15:41, skrev Christoph M. Becker: > >> On 02.04.2019 at 11:42, Nicolai Scheer wrote: >> >>> I'm currently in the process of migrating an old application from php >>> 5.6 >>> to 7.2. >>> In the process, I fiddled with the default_charset ini setting. >>> >>> The documentation states (c.f. >>> https://www.php.net/manual/en/ini.core.php#ini.default-charset): >>> >>> "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of >>> default_charset >>> will also be used to set the default character set for [...] and for >>> mbstring functions >>> if the mbstring.http_input mbstring.http_output >>> mbstring.internal_encoding >>> configuration option is unset." >>> >>> As such, I'd expect to be able to set default_charset to iso-8859-1 and >>> mbstring to pick that same setting for its internal encoding (if the >>> mentioned directives are unset, that is). >>> >>> This seems not to be the case: >>> >>> <?php >>> ini_set( 'default_charset', 'iso-8859-1' ); >>> var_dump( ini_get("mbstring.internal_encoding") ); >>> var_dump( ini_get("mbstring.http_input") ); >>> var_dump( ini_get("mbstring.http_output") ); >>> echo mb_internal_encoding() . "\n"; >>> echo mb_strlen( "\xc3\xb6" ) . "\n"; >>> echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n"; >>> >>> This outputs (7.2.15 on a CentOS box): >>> string(0) "" >>> string(0) "" >>> string(0) "" >>> UTF-8 >>> 1 >>> 2 >>> >>> The default_charset is set but mbstring settings are not, so I'd >>> expect to >>> get 2 as the character/byte count in both cases. >>> >>> If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string >>> lengths are equal. >>> >>> Since the mentioned mbstring directives are deprecated as of 5.6.0 - >>> do I >>> really need to use mb_internal_encoding() instead? >>> Is the documentation wrong or am I just misinterpreting it? I thought >>> that >>> default_charset should act as some kind of "master setting" in order >>> not to >>> have to set all specific settings as well (e.g. iconv, mbstring). >>> >>> Usually we use UTF-8, so I did not come across this before... >>> >>> Any insight? >> >> <https://3v4l.org/ZvQ67> confirms the reported behavior. A quick look >> at the code, too. I suggest you file a ticket on >> <https://bugs.php.net/>. > > Did this lead to a bug report?
Hmm, apparently not. > It lead to a bug in Smarty 3.1.33 for me. I got a warning about > "mbregex compile err: invalid code point value" in mb_split(). > I have content in ISO-8859-1 and Smarty normal procedure to > set encoding and php.ini setting to ISO-8859-1 flunked. > > However mb_regex_encoding('ISO-8859-1') did the trick! While the RFC[1] states | all functions that take encoding option use php.internal_encoding as | default (e.g. htmlentities/mb_strlen/mb_regex/etc) apparently this has not been implemented (yet). [1] <https://wiki.php.net/rfc/default_encoding> -- Christoph M. Becker -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php