On Tue, Mar 13, 2012 at 1:52 AM, Yasuo Ohgaki <yohg...@ohgaki.net> wrote:

> 2012/3/13 Rasmus Lerdorf <ras...@lerdorf.com>:
> > On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:
> >> I thought default_charset became UTF-8, so I was expecting
> >> following HTTP header.
> >>
> >> content-type  text/html; charset=UTF-8
> >>
> >> However, I got empty charset (missing 'charset=UTF-8').
> >> So I looked up to source and found the line in SAPI.h
> >>
> >> 293   #define SAPI_DEFAULT_CHARSET        ""
> >>
> >> Empty string should be "UTF-8", isn't it?
> >
> > No, we can't force an output charset on people since it would end up
> > breaking a lot of sites.
>
> Right, so may be for the next major release? 5.5.0?
>
> As the first XSS advisory in 2000 states, explicitly setting char coding
> will
> prevent certain XSS. Recent browsers have much better encoding handing,
> but setting encoding explicitly is better for security still.
>
> > PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:
> >
> >    if (charset_hint == NULL)
> >                return cs_8859_1;
> >
> > and in 5.4 we have:
> >
> >    if (charset_hint == NULL)
> >                return cs_utf_8;
> >
> > So there is no difference in their guessing when there is no hint, the
> > only difference is that in 5.4 we choose utf8 and in 5.3 we choose
> > 8859-1 in that case.
>
> I got this with 5.3
> <?php
> echo htmlentities('<日本語UTF-8>',ENT_QUOTES);
> echo htmlentities('<日本語UTF-8>',ENT_QUOTES, 'UTF-8');
>
> &lt;&aelig;�&yen;&aelig;�&not;&egrave;&ordf;�UTF8
> &gt;&lt;日本語UTF-8&gt;
>
> So people migrating from 5.3 to 5.4 should not have problems.
> Migration older than 5.3 to 5.4 will be problematic.
>
> I always set all parameters for htmlentities/htmlspecialchars, therefore
> I haven't noticed this was changed from 5.3. They may be migrating from
> 5.2 or older. (RHEL5 uses 5.1)
>
> Since PHP does not have default multibyte module, it may be good for having
>
> input_encoding
> internal_encoding
> output_encoding
>
>
I would then propose to make mbstring compile time mandatory.

I'm against yet another global ini setting, I find the actual ini settings
confusing enough to add one more that would moreover reflect mbstring one's
(and add more and more confusion).
Why not turn ext/mbstring mandatory at compile time, for all future PHP
versions, like preg or spl are ?

We do need multibyte handling either. ZendEngine takes advantage of
mbstring for internal encoding as well, so I probably missed something as
why it is still possible to --disable-mbstring (or not add
--enable-mbstring) when compiling ? Has it a huge performance impact ?

Thank you :)

Julien.P

Reply via email to