On Tuesday, 8 January 2013 at 17:52:24 UTC, Joseph Rushton
Wakeling wrote:
Hello all,
Following discussion on the pull request adding normal random
number generation to std.random [
https://github.com/D-Programming-Language/phobos/pull/1029 ],
some issues were raised which are best discussed with the whole
community.
The heart of this is the design of pseudo-random number
generators (PRNGs) in Phobos. Currently these are implemented
as value types, which has a number of unpleasant effects:
-- They're expensive to pass around (some PRNGs have a size
of a MB or more)
-- Passing by value is statistically unsafe, as it can
result in identical
random sequences being generated in different parts of
the code. This
already affects at least one part of std.random itself:
see,
http://d.puremagic.com/issues/show_bug.cgi?id=8247
http://forum.dlang.org/thread/[email protected]
-- Simply passing by reference isn't an adequate solution,
as there will be
cases (as with RandomSample, detailed in bug 8247) where
you have to
store the RNG. Storing e.g. a pointer or reference would
be unsafe; the
only adequate solution is that PRNGs be (safe) reference
types.
monarch_dodra did some work on this which was set aside in the
short term because it would (unavoidably) be a breaking change
[ https://github.com/D-Programming-Language/phobos/pull/893 ].
To avoid this, the proposed solution is to create a std.random2.
However, these issues seem to me to have broader implications
for the design of random-number functionality in Phobos, so if
we're going to do a std.random2, it's worth mapping them out so
as to get them right.
The most obvious (to me) is that these issues which apply to
PRNGs apply equally well to random number distributions. For
example the Ziggurat algorithm requires storing several hundred
constants, so passing by value is expensive; and several
different algorithms generate and store multiple random
variates at a time, so copying/passing by value will result in
unintended correlations in sequences of variates.
This has further implications if for example we want to create
a VariateGenerator (or perhaps in D-ish terms, a VariateRange)
which couples a random distribution with a PRNG -- this is
unlikely to work unless both the random distribution and the
PRNG are reference types.
Finally, there are more general issues about how new
functionality should be implemented. C++11 is given as a model
in the std.random documentation, but this is clearly a guide
rather than something to copy blindly -- existing functions and
structs already deviate from it in ways that reflect D's
preference for ranges and its superior generics. We need a
clear picture of how to do this for functionality that has not
yet been implemented.
For example: in the case of random distributions, the current
example of uniform() offers only a function interface and
therefore little guidance about how to create struct
implementations for more complex algorithms which require
persistent storage (e.g. Ziggurat or Box-Muller). Should they
follow C++11/Boost.Random in returning variates via opCall, or
should they be coupled with a PRNG at construction time (as
with RandomSample) and implement a range of variates?
My inclination here is to take some time to map out the
different interface/design options and to present the choices
to the community for review as a precursor to creating a
std.random2. It seems the only really sensible choice to
ensure that we get a good future-proof design.
What does everyone think?
Best wishes,
-- Joe
I imagine there has been some detailed discussion of the
std.nameX idea of libraries so forgive me if this has been
discussed. Using this as an approach to essentially replacing
libraries instead of the depreciation route would seem to
potentially lead to a situation where you have std.name1, 2, 3, 4
ad infinitum which may not have exactly the same feature set or
use cases and would lead to people hunting around multiple
libraries which on the face of it are for the same thing and
using them in projects. Would it not be better to bite the bullet
and depreciate? random2 simply sounds like random done properly,
with random and random2 people would often use random, and so
still suffer the consequences of its issues, and possibly
overlook random2. You're depreciating because it's broken, not
because people want to mess around with trivialities.