Hi,

It is well-known that it is impossible to map e.g. a 32-bit random
number with a uniform distribution over its full range of values onto a
range with fewer different values while maintaining a uniform
distribution, except when the target range contains a whole power of 2
number of different values.

Prior to 7.1.0, mt_rand()'s mapping onto the target range was done with
some floating-point math, so there was no modulo division exposed in
code, yet the problem above is fundamental and indeed it applied, e.g.:

https://3v4l.org/ihCT8

<?php
// PHP pre-7.1.0 modulo bias demo
mt_srand(1234567890);
$total = 100000;
$max = 0x66666666;
$halves[0] = $halves[1] = 0;
for ($i = 0; $i < $total; $i++) {
    $halves[(mt_rand(0, $max - 1) >> 1) & 1]++;
}
printf("%.1f%% vs. %.1f%%\n", 100. * $halves[0] / $total, 100. * $halves[1] / 
$total);
?>

Output for 7.1.0 - 7.2.0beta2
    49.9% vs. 50.1%
Output for 5.2.1 - 5.6.30, hhvm-3.10.1 - 3.21.0, 7.0.0 - 7.0.20
    40.0% vs. 60.0%
Output for 4.3.0 - 5.2.0
    39.8% vs. 60.2%

PHP 7.1.0 moved to explicit integer modulo division yet tried to avoid
the modulo bias by generating multiple original 32-bit or 64-bit random
numbers in cases where the modulo bias would occur.  Unfortunately,
because of an implementation bug it failed, e.g.:

https://3v4l.org/kBpqh

<?php
// PHP 7.1.0 to 7.2.0beta2 modulo bias bug found during work on 
http://www.openwall.com/php_mt_seed/
mt_srand(1234567890);
$total = 100000;
$max = 0x66666666;
$halves[0] = $halves[1] = 0;
for ($i = 0; $i < $total; $i++) {
    $halves[mt_rand(0, $max - 1) / ($max / 2)]++;
}
printf("%.1f%% vs. %.1f%%\n", 100. * $halves[0] / $total, 100. * $halves[1] / 
$total);
?>

Output for 7.1.0 - 7.2.0beta2
    60.0% vs. 40.0%
Output for 4.3.0 - 5.6.30, hhvm-3.10.1 - 3.21.0, 7.0.0 - 7.0.20
    50.0% vs. 50.0%

(Even higher bias can be seen in both cases by changing the 0x66666666
to 0xaaaaaaaa.)

I first found the bug through code review while working on adding PHP
7.1.0+ support to php_mt_seed.  The above 3v4l was only to confirm it.

As I said in the Twitter thread (certainly not a proper place):

https://twitter.com/solardiz/status/897617315008839686

"The bug is unconditional use of 64-bit ZEND_ULONG_MAX in the bias
avoidance, even when the intermediate result is 32-bit (up to UINT32_MAX)."

Leigh confirmed this understanding, tweeting:

"Yep, happens when (max - min) is less than UINT32_MAX. A check on
(umax > UINT32_MAX) and setting the limit fixes it."

This sounds like almost the correct fix to me.  The code is:

PHPAPI zend_long php_mt_rand_range(zend_long min, zend_long max)
{
        zend_ulong umax = max - min;
        zend_ulong limit;
        zend_ulong result;

        result = php_mt_rand();
#if ZEND_ULONG_MAX > UINT32_MAX
        if (umax > UINT32_MAX) {
                result = (result << 32) | php_mt_rand();
        }
#endif

        /* Special case where no modulus is required */
        if (UNEXPECTED(umax == ZEND_ULONG_MAX)) {
                return (zend_long)result;
        }

        /* Increment the max so the range is inclusive of max */
        umax++;

        /* Powers of two are not biased */
        if (EXPECTED((umax & (umax - 1)) != 0)) {
                /* Ceiling under which ZEND_LONG_MAX % max == 0 */
                limit = ZEND_ULONG_MAX - (ZEND_ULONG_MAX % umax) - 1;

                /* Discard numbers over the limit to avoid modulo bias */
                while (UNEXPECTED(result > limit)) {
#if ZEND_ULONG_MAX > UINT32_MAX
                        if (umax > UINT32_MAX) {
                                result = (result << 32) | php_mt_rand();
                        }
                        else {
                                result = php_mt_rand();
                        }
#else
                        result = php_mt_rand();
#endif
                }
        }

        return (zend_long)((result % umax) + min);
}

and what Leigh means is something like:

                /* Ceiling under which *_MAX % max == 0 */
                if (umax > UINT32_MAX)
                        limit = ZEND_ULONG_MAX - (ZEND_ULONG_MAX % umax) - 1;
                else
                        limit = UINT32_MAX - (UINT32_MAX % umax) - 1;

Strictly speaking, the check should be the same as the one earlier in
the function, which decided how "result" was obtained with one or two
calls to php_mt_rand().  The above proposed check isn't exactly the same
because of the "umax++" inbetween.  The special case of "umax" being
exactly UINT32_MAX prior to the increment needs more thought.  Probably
the check in the first instance of:

        if (umax > UINT32_MAX) {
                result = (result << 32) | php_mt_rand();
        }

is buggy and should actually be:

        if (umax >= UINT32_MAX) {
                result = (result << 32) | php_mt_rand();
        }

whereas further code should continue with "umax > UINT32_MAX" because of
the increment.

Also, why even bother to support ranges beyond 32-bit?  Sounds like a
misfeature to me, considering it won't(?) be universally available on
all PHP builds anyway (not on 32-bit ones, right?) and thus shouldn't(?)
be relied upon by applications (although it might become reasonable for
application developers not to care about 32-bit soon).  I also see few
use cases for it, even if it were universally available.

The main bug reported in this message (wrongly using ZEND_ULONG_MAX for
computing "limit" when "result" is 32-bit) also means that PHP 7.1.0 to
7.2.0beta2 probably produce partially different sequences of mt_rand()
outputs between 32- and 64-bit builds for the same seed, for most ranges
that fit within 32 bits.  Specifically, 32-bit builds probably correctly
avoid the modulo bias.  (However, I didn't confirm this with a test.
Unfortunately, 3v4l.org appears to be 64-bit only.)  Possibility of bugs
like this is yet another reason not to have introduced the little-needed
64-bit build specific functionality.  OTOH, this was also a (missed)
opportunity to detect this bug with more testing (requiring same
behavior of 32- and 64-bit builds for 32-bit inputs).

Alexander

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to