Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode?

Rowan Tommins Mon, 22 Mar 2021 08:41:36 -0700

On 22/03/2021 15:04, Aleksander Machniak wrote:

I'm using utf8_encode()/utf8_decode() to make input string safe to be
stored in DB, and back. In most cases the input is utf-8, but it
occasionally may contain "broken characters".

That is not what this function does, at all. The fact that its namemakes you think that is exactly why I want to get rid of that name.

     $str = "グーグル谷歌中信фδοκιμήóźdźрöß😁😃";

     $this->assertSame($str, utf8_decode(utf8_encode($str)));



Let's write that out with a more descriptive function name:

$str = "グーグル谷歌中信фδοκιμήóźdźрöß😁😃";

$this->assertSame($str, utf8_to_latin1(latin1_to_utf8($str)));

Since Latin-1 does not contain any Chinese, Japanese, or Emojicharacters, running latin1_to_uft8 on that string is clearly nonsensical.

The only reason it doesn't give you any errors is that every possiblebyte is a valid character in Latin1, and every Latin1 character has aUnicode code point. So the "グ" is interpreted as three Latin-1characters: E3, 82, and B0; those then become the corresponding Unicodecode points U+00E3, U+00821, and U+00B0, represented in UTF-8. You thenrun utf8_to_latin1, and they get converted back.


That code will never do anything useful.

Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode?

Reply via email to