Hi list,

I'm currently in the process of migrating an old application from php 5.6
to 7.2.
In the process, I fiddled with the default_charset ini setting.

The documentation states (c.f.
https://www.php.net/manual/en/ini.core.php#ini.default-charset):

"In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of
default_charset
will also be used to set the default character set for [...] and for
mbstring functions
if the mbstring.http_input mbstring.http_output mbstring.internal_encoding
configuration option is unset."

As such, I'd expect to be able to set default_charset to iso-8859-1 and
mbstring to pick that same setting for its internal encoding (if the
mentioned directives are unset, that is).

This seems not to be the case:

<?php
ini_set( 'default_charset', 'iso-8859-1' );
var_dump( ini_get("mbstring.internal_encoding") );
var_dump( ini_get("mbstring.http_input") );
var_dump( ini_get("mbstring.http_output") );
echo mb_internal_encoding() . "\n";
echo mb_strlen( "\xc3\xb6" ) . "\n";
echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n";

This outputs (7.2.15 on a CentOS box):
string(0) ""
string(0) ""
string(0) ""
UTF-8
1
2

The default_charset is set but mbstring settings are not, so I'd expect to
get 2 as the character/byte count in both cases.

If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string
lengths are equal.

Since the mentioned mbstring directives are deprecated as of 5.6.0 - do I
really need to use mb_internal_encoding() instead?
Is the documentation wrong or am I just misinterpreting it? I thought that
default_charset should act as some kind of "master setting" in order not to
have to set all specific settings as well (e.g. iconv, mbstring).

Usually we use UTF-8, so I did not come across this before...

Any insight?

Greetings

Nico

Reply via email to