Hi I think following PHP 5.4.0 NEWS entry is misleading.
. Changed default value of "default_charset" php.ini option from ISO-8859-1 to UTF-8. (Rasmus) I thought default_charset became UTF-8, so I was expecting following HTTP header. content-type text/html; charset=UTF-8 However, I got empty charset (missing 'charset=UTF-8'). So I looked up to source and found the line in SAPI.h 293 #define SAPI_DEFAULT_CHARSET "" Empty string should be "UTF-8", isn't it? BTW, empty charset in HTTP header does not mean the default will be ISO-8859-1, but it let browser guess the encoding is used. Guessing encoding may cause XSS under certain conditions. Anyway, I was curious so I've checked ext/standard/html.c and found /* {{{ entity_charset determine_charset * returns the charset identifier based on current locale or a hint. * defaults to UTF-8 */ static enum entity_charset determine_charset(char *charset_hint TSRMLS_DC) { int i; enum entity_charset charset = cs_utf_8; int len = 0; const zend_encoding *zenc; /* Default is now UTF-8 */ if (charset_hint == NULL) return cs_utf_8; There are 2 problems. - php.ini's default_charset should be UTF-8. - determine_charset() should not blindly default to UTF-8 when there are no hint. Old htmlentities/htmlspecialchars actually determines charset from default_charset/mbstring.internal_encoding/etc. I think old behavior is better than now. How about make determine_charset() behaves like 5.3 and set the SAPI_DEFAULT_CHARSET to "UTF-8"? Then PHP will behave like as NEWS mentions, htmlentities/htmlspecialchars default encoding became 'UTF-8' and users will have control for default htmlenties/htmlspecialchars encoding. Regards, -- Yasuo Ohgaki yohg...@ohgaki.net -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php