On 04/09/2025 07:44, Jorg Sowa wrote:
+ enigmatic "all functions that take encoding option use
php.internal_encoding as default (e.g.
htmlentities/mb_strlen/mb_regex/etc)"
This is not very well worded, but I believe what it's saying is that a
number of functions have an "encoding" parameter, which defaults to some
INI setting. Since some of those settings were proposed to be removed,
and others added, they needed a new default.
However, further up it seems to propose a different default:
Use default_charset as default for encoding related php.ini settings
and module/functions.
What's more, there isn't actually a setting called
"php.internal_encoding", it's just called "internal_encoding".
Regardless of what was intended 11 years ago, we need to decide what to
do now.
htmlentities is currently documented like this
[https://www.php.net/htmlentities]:
An optional argument defining the encoding used when converting
characters.
If omitted, encoding defaults to the value of the default_charset
configuration option.
So, I guess that doesn't need to change, because that setting isn't
deprecated?
The standard wording for all the ext/mbstring functions is currently
this [e.g. https://www.php.net/manual/en/function.mb-convert-case.php]:
The encoding parameter is the character encoding. If it is omitted or
null, the internal character encoding value will be used.
This is rather vague. In practice, it takes the value of
"mbstring.internal_encoding"; if not set, "internal_encoding"; if not
set, "default_charset" - but I can't find anywhere in the manual stating
this directly.
Without INI values following functions are becoming pointless:
- iconv_set_encoding
- iconv_get_encoding
- mb_internal_encoding
- mb_http_output
Neither of the RFCs explicitly mention any of these functions, and none
has any deprecation note in the manual.
iconv_set_encoding triggers a deprecation notice at run-time, but
iconv_get_encoding does not.
mb_internal_encoding does not trigger any deprecation notice even when
it's used to set a new value. Until 5 years ago, it was also heavily
implied to be the correct function to use - until they were rebuilt from
stub files, the synopses for mbstring functions in the manual looked
like this:
mb_convert_case ( string $str , int $mode [, string $encoding =
mb_internal_encoding() ] ) : string
I suspect mb_internal_encoding is rather widely used to set a *run-time*
default, unrelated to the INI settings. If we want to remove it, let's
go all the way and make the encoding parameter mandatory in some future
version, rather than defaulting to a different global value.
--
Rowan Tommins
[IMSoP]