On 04/09/2025 07:44, Jorg Sowa wrote:

+ enigmatic "all functions that take encoding option use php.internal_encoding as default (e.g. htmlentities/mb_strlen/mb_regex/etc)"


This is not very well worded, but I believe what it's saying is that a number of functions have an "encoding" parameter, which defaults to some INI setting. Since some of those settings were proposed to be removed, and others added, they needed a new default.

However, further up it seems to propose a different default:

Use default_charset as default for encoding related php.ini settings and module/functions.

What's more, there isn't actually a setting called "php.internal_encoding", it's just called "internal_encoding".


Regardless of what was intended 11 years ago, we need to decide what to do now.



htmlentities is currently documented like this [https://www.php.net/htmlentities]:

An optional argument defining the encoding used when converting characters.

If omitted, encoding defaults to the value of the default_charset configuration option.

So, I guess that doesn't need to change, because that setting isn't deprecated?




The standard wording for all the ext/mbstring functions is currently this [e.g. https://www.php.net/manual/en/function.mb-convert-case.php]:

The encoding parameter is the character encoding. If it is omitted or null, the internal character encoding value will be used.


This is rather vague. In practice, it takes the value of "mbstring.internal_encoding"; if not set, "internal_encoding"; if not set, "default_charset" - but I can't find anywhere in the manual stating this directly.


Without INI values following functions are becoming pointless:
- iconv_set_encoding
- iconv_get_encoding
- mb_internal_encoding
- mb_http_output


Neither of the RFCs explicitly mention any of these functions, and none has any deprecation note in the manual.

iconv_set_encoding triggers a deprecation notice at run-time, but iconv_get_encoding does not.

mb_internal_encoding does not trigger any deprecation notice even when it's used to set a new value. Until 5 years ago, it was also heavily implied to be the correct function to use - until they were rebuilt from stub files, the synopses for mbstring functions in the manual looked like this:


 mb_convert_case ( string $str , int $mode [, string $encoding = mb_internal_encoding() ] ) : string


I suspect mb_internal_encoding is rather widely used to set a *run-time* default, unrelated to the INI settings. If we want to remove it, let's go all the way and make the encoding parameter mandatory in some future version, rather than defaulting to a different global value.


--
Rowan Tommins
[IMSoP]

Reply via email to