ID: 46478 Updated by: [EMAIL PROTECTED] Reported By: for-bugs at hnw dot jp -Status: Open +Status: Analyzed Bug Type: Feature/Change Request Operating System: * PHP Version: 5.2.6 -Assigned To: +Assigned To: moriyoshi New Comment:
I think this is a bug, but correcting the table should break BC too. Previous Comments: ------------------------------------------------------------------------ [2008-11-04 12:56:40] for-bugs at hnw dot jp Description: ------------ ext/standard/html.c has incorrect mapping table which htmlentities() uses. html.c is based on http://www.unicode.org/Public/MAPPINGS/OBSOLETE/UNI2SGML.TXT, but this mapping table is obsolete and not compatible with HTML4.0 or XHTML1.0. For example, U+2235(which is encoded to "\xe2\x88\xb5" with UTF-8) is not in http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent, but htmlentities() returns "∵". U+226A(≪) and U+226B(≫) are similler case. Reproduce code: --------------- <?php var_dump(htmlentities("\xe2\x88\xb5", ENT_QUOTES, "utf-8")); Expected result: ---------------- string(3) "æ" Actual result: -------------- string(8) "∵" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=46478&edit=1