On Nov 28, 2023, at 11:12, Claude Pache <claude.pa...@gmail.com> wrote: > Le 28 nov. 2023 à 19:57, Hans Henrik Bergan <divinit...@gmail.com> a écrit : >> With the dominance of UTF-8 (a fixed-endian encoding), surely no new >> code should utilize any of declare(encoding='...') / zend.multibyte / >> zend.script_encoding / zend.detect_unicode. >> I propose we deprecate all 4. > > What is the migration path for legacy code that use those directives?
Convert your PHP source files to UTF-8. These directives are only required for code written in legacy multibyte encodings like Shift-JIS, Big5, or EUC-CN. (These encodings are primarily used for Chinese and Japanese text.) These directives are not required for scripts which *process* text in these encodings. They're only required if the source code itself is in a legacy multibyte encoding, as those encodings can contain octets in the basic ASCII range (0x20 - 0x7f) within multibyte sequences. For example, the character "ボ" (U+30DC KATAKANA LETTER BO) is encoded in Shift-JIS as 83 7B, whose second octet would ordinarily represent the ASCII character "{". If this character appeared in a variable name, for instance, PHP would need to recognize that the "7B" does not represent open brace. >> With the dominance of UTF-8 (a fixed-endian encoding) I'll add that what's special about UTF-8 isn't that it's "fixed-endian". It's that UTF-8 only uses octets above 0x7F for characters outside the ASCII range, so the parser doesn't have to be specifically aware of UTF-8 encoding when processing text. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php