On Nov 28, 2023, at 11:12, Claude Pache <claude.pa...@gmail.com> wrote:
> Le 28 nov. 2023 à 19:57, Hans Henrik Bergan <divinit...@gmail.com> a écrit :
>> With the dominance of UTF-8 (a fixed-endian encoding), surely no new
>> code should utilize any of declare(encoding='...') / zend.multibyte /
>> zend.script_encoding / zend.detect_unicode.
>> I propose we deprecate all 4.
> 
> What is the migration path for legacy code that use those directives?

Convert your PHP source files to UTF-8. These directives are only required for 
code written in legacy multibyte encodings like Shift-JIS, Big5, or EUC-CN. 
(These encodings are primarily used for Chinese and Japanese text.)

These directives are not required for scripts which *process* text in these 
encodings. They're only required if the source code itself is in a legacy 
multibyte encoding, as those encodings can contain octets in the basic ASCII 
range (0x20 - 0x7f) within multibyte sequences. For example, the character "ボ" 
(U+30DC KATAKANA LETTER BO) is encoded in Shift-JIS as 83 7B, whose second 
octet would ordinarily represent the ASCII character "{". If this character 
appeared in a variable name, for instance, PHP would need to recognize that the 
"7B" does not represent open brace.

>> With the dominance of UTF-8 (a fixed-endian encoding)

I'll add that what's special about UTF-8 isn't that it's "fixed-endian". It's 
that UTF-8 only uses octets above 0x7F for characters outside the ASCII range, 
so the parser doesn't have to be specifically aware of UTF-8 encoding when 
processing text.
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to