Hi
On 2/15/22 04:58, Go Kudo wrote:
Regarding "unintuitive": I disagree. I find it unintuitive that there are
some RNG sequences that I can't access when providing a seed.
This is also the case for RNG implementations in many other languages. For
example, Java also uses long (64-bit) as the seed value of the argument for
Math.
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Random.html#%3Cinit%3E(long)
java.util.Random is a LCG with only 48 Bits of state. A single 64-bit
signed long is sufficient to represent the state.
On the other hand, some languages have access to the complete internal
state. Python, for example, accepts bytes or bytearrays.
https://docs.python.org/3/library/random.html#random.seed
However, making strings available in PHP may lead to incorrect usage.
I think we can safely do this by making the seed argument accept both int
and string, and only using it as the internal state if string is specified
and it's 128-bits long.
That's a solution that would work for me.
1. Would you expect those two 'var_dump' calls to result in the same
output?
Added __debugInfo() magic method supports.
https://github.com/php/php-src/pull/8094/commits/78efd2bd1e0ac5db48c272b364a615a5611e8caa
Don't forget to update the RFC accordingly. It would probably be helpful
if you would put the full class stubs into the RFC. I find that easier
to understand than a list of methods.
generate() should return raw bytes instead of a number (as I suggested
before).
I don't think this is a very good idea.
The RNG is a random number generator and should probably not be generating
strings.
I'd say that the 'number' part in RNG is not technically accurate. All
RNGs are effectively generators for a random sequence of bits. The
number part is just an interpretation of those random sequence of bits
(e.g. 64 of them).
Of course, I am aware that strings represent binary sequences in PHP.
However, this is not user-friendly.
The generation of a binary string is a barrier when trying to implement
some kind of operation using numeric computation.
I believe the average user of the RNG API would use the Randomizer
class, instead of the raw generators, thus they would not come in
contact with the raw bytes coming from the generator.
However by getting PHP integers out of the generator it is much harder
for me to process the raw bits and bytes, if that's something I need for
my use case.
As an example if I want to implement the following in userland. Then
with getting raw bytes:
- For Randomizer::getBytes() I can just concatenate the raw bytes.
- For a random uint16BE I can grab 2 bytes and call unpack('n', $bytes)
If I get random 64 Bit integers then:
- For Randomizer::getBytes() I need to use pack and I'm not even sure,
whether I need to use 'q', 'Q', 'J', 'P' to receive an unbiased result.
- For uint16BE I can use "& 0xFFFF", but would waste 48 Bits, unless I
also perform bit shifting to access the other bytes. But then there's
also the same signedness issue.
Interpreting numbers as bytes and vice versa in C / C++ is very easy.
However in PHP userland I believe the bytes -> numbers direction is
easy-ish. The numbers -> bytes direction is full of edge cases.
If you want to deal with the problem of generated size, it would be more
appropriate to define a method such as getGenerateSize() in the interface.
Even in this case, generation widths greater than PHP_INT_SIZE cannot be
supported, but generation widths greater than 64-bit are not very useful in
the first place.
The 'Randomizer' object should buffer unused bytes internally and only
call generate() if the internal buffer is drained.
Likewise, I think this is not a good idea. Buffering reintroduces the
problem of complex state management, which has been made so easy. The user
will always have to worry about the buffering size of the Randomizer.
Unfortunately you did not answer the primary question. The ones you
answered were just follow-up conclusions from the answer I would give:
var_dump(\bin2hex($r1->getBytes(8)));
var_dump(\bin2hex($r2->getBytes(4)) . \bin2hex($r2->getBytes(4)));
As a user: Would you expect those two 'var_dump' calls to result in the
same output?
Why xorshift instead of xoshiro / xoroshiro?
The XorShift128Plus algorithm is still in use in major browsers and is dead
in a good way.
I believe that that the underlying RNG in web browsers is considered an
implementation detail, no?
For PHP this would be part of the API surface and would need to be
maintained indefinitely. Certainly it would make sense to use the latest
and greatest RNG, instead of something that is outdated when its first
shipped, no?
Also, in our local testing, SplitMix64 + XorShift128Plus performed well in
terms of performance and random number quality, so I don't think it is
necessary to choose a different algorithm.
If this RFC passes, it will be easier to add algorithms in the future. If a
new algorithm is needed, it can be implemented immediately.
Best regards
Tim Düsterhus
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php