Hi internals!

I'd like to change our double-to-string casting behavior to be
locale-independent and would appreciate some opinions as to whether you
consider this feasible.

So, first off, this is how PHP currently behaves:

    <?php setlocale(LC_ALL, 'de_DE');
    var_dump((string) 3.14);
    // string(4) "3,14"

The de_DE locale uses "," as the decimal separator (rather than ".") and
PHP makes use of this information when casting floating point numbers to
string.

That may seem like a nice feature, but practically it causes a lot of
issues: While PHP has no problem using "," when outputting floats, nothing
(*including PHP itself*) actually accepts that format.

E.g. if you have a floating point number and cast it to a string you will
*NOT* be able to cast the string back to a float, because PHP can't handle
the comma. This breaks PHP's usual paradigm of "numeric strings should
behave the same way as floats/ints".

    <?php
    $float = 3.14;
    $string = (string) $float;
    $newFloat = (float) $string;
    var_dump($newFloat);
    // double(3)
    // WTF???

But this issue is not specific to PHP's own (float) cast. Practically no
protocols, APIs, etc accept floating point numbers with a comma.

Some examples:

 1. If you create a MySQL query and put in a double value like so:

    $query = "INSERT INTO ... VALUES ($double)";
    // assume that $double is guaranteed to be a double here

    I think the assumption the vast majority of developers would have here
is that the above code works correctly and is secure (under the assumption
that $double really is a double and you verified that). But that's not
true. With a comma-locale like de_DE this will output the double with a
comma, so you'll end up with something like this:

    "INSERT INTO ... VALUES (3.14)" // normal locale and expected behavior
    "INSERT INTO ... VALUES (3,14)" // comma-locale and unexpected behavior

    Not only does a change in locale break the code, it actually completely
changes semantics (a tuple with one floating point value becomes a tuple
with two integer values).

 2. The example that brought this issue to my attention again today is that
our own BCMath extension break down when you use it with floating point
values and a comma-locale (https://bugs.php.net/bug.php?id=55160).

 3. Another case where things can seriously go wrong is outputting doubles
in the generation of code (be it PHP for caching purposes or JS for the
client). To get around the issue you usually need to introduce some very
ugly code that changes the LC_NUMERIC locale to 'C'. E.g. this is what Twig
uses in its code generator:

            if (false !== $locale = setlocale(LC_NUMERIC, 0)) {
                setlocale(LC_NUMERIC, 'C');
            }

            $this->raw($value);

            if (false !== $locale) {
                setlocale(LC_NUMERIC, $locale);
            }

    In this case (just like with MySQL) you will also not just emit wrong
code, but it can end up being working code with totally different semantics
(as "," is usually a function argument separator).

These are just three random examples I came up with, but I've seen this
issue a lot of times. The insidious thing about it is that, with very high
probability, you will not notice this issue during development (because you
don't use locales), it will only turn up later.

So, my suggestion is to change the (string) cast to always use "." as the
decimal separator, independent of locale. The patch for this is very
simple, just need to change a few occurrences of "%.*G" to "%.*H".

I think not having the locale-dependent output won't be much of a loss for
anyone, because if you need to actually localize the output of your
numbers, it is very likely that just replacing the decimal separator is not
enough (you will at least want to have a thousands-separator as well, i.e.
you want to use number_format).

So, thoughts?

Nikita
(Sorry for the long rant)

Reply via email to