Re: [math] Improving tests and performance of RandomGenerator implementations

Phil Steitz Wed, 03 Aug 2011 09:15:59 -0700

On 8/3/11 9:02 AM, sebb wrote:
> On 3 August 2011 09:06, Luc Maisonobe <[email protected]> wrote:
>> Le 03/08/2011 09:38, Luc Maisonobe a écrit :
>>> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
>>>> Hi Phil,
>>>>
>>>> Le 01/08/2011 20:39, Phil Steitz a écrit :
>>>>> On 8/1/11 1:31 AM, [email protected] wrote:
>>>>>> Hi Phil,
>>>>>>
>>>>>> ----- Mail original -----
>>>>>>> In my own applications, I noticed what appears to be poor
>>>>>>> performance in the nextInt(int) method of the Mersenne twister,
>>>>>>> which I was using to *improve* speed. I think that for small n, the
>>>>>>> default implementation in BistreamGenerator may be running too many
>>>>>>> iterations.
>>>>>> Mersenne twister uses a quite large pool. It creates pseudo-random bits
>>>>>> by twisting it and creates large bunches at a time (624 words at a
>>>>>> time).
>>>>>> Hence when you ask for large sets, you should have several calls that
>>>>>> return fast, and one call that takes a longer time to generate another
>>>>>> large pool.
>>>>>>
>>>>>> So good performances are obtained for generating large sets, not
>>>>>> small sets.
>>>>>>
>>>>>> Well generators should be faster and are preferred over Mersenne
>>>>>> twister now,
>>>>>> which is now an old generator. Well generators also have large pools,
>>>>>> but they
>>>>>> don't generate bits in large batches in advance, they do generate a
>>>>>> few words
>>>>>> at a time.
>>>>> Yeah, I know. Both are faster than the JDK, though, even for just
>>>>> 32-bit chunks in my tests at least.
>>>>>
>>>>> One thing I have been thinking about is exposing nextInt[],
>>>>> nextDouble[], nextGaussian[] etc methods that take advantage of the
>>>>> pools. So you basically generate a large block of bits use this to
>>>>> fill the output arrays.
>>>> Seems a very good idea. Most of the time, people generate only one kind
>>>> of numbers several times, so it really does make sense.
>>>>
>>>>>>> I am still figuring out how the code works, but I
>>>>>>> thought it would be good to run some benchmarks - using Gilles' new
>>>>>>> stuff - against the Harmony implementation in java.util.Random of
>>>>>>> this method. That led me to notice that there are no unit tests for
>>>>>>> BitstreamGenerator. I propose that we add
>>>>>>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
>>>>>>> method including fixed seed tests for all RandomGenerator methods
>>>>>>> 1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>>>>>>> implementing makeGenerator with a BitStreamGenerator that uses the
>>>>>>> JDK generator for next(int)
>>>>>>> 2) Make the test classes for Mersenne and Weil generators extend
>>>>>>> RandomGeneratorAbstractTest, moving redundant tests up into the base
>>>>>>> class
>>>>>>>
>>>>>>> Sound reasonable?
>>>>>> +1
>>>>>>
>>>>>>> Also, any recollection why we are using a
>>>>>>> different implementation in BitStreamGenerator for next(int) than
>>>>>>> Harmony and the JDK use?
>>>>>> I don't understand what you mean. next(int) is used to generate the raw
>>>>>> bits and is the heart of each generator. Each generator has its own
>>>>>> implementation. Replacing next(int) by the JDK generation would imply
>>>>>> dropping completely Mersenne twister and Well generators.
>>>>> I am sorry. I meant nextInt(int). It is that code that seems to be
>>>>> slow in BitStreamGenerator and different from the JDK and Harmony.
>>>> Could you point me at some code ? There are many pitfalls in netInt(int)
>>>> if one wants to make sure the generator is uniform, which explain the
>>>> strange implementation, with the mask computation and the loop. By the
>>>> way, even this implementation would benefit from your proposed array
>>>> generation, as the mask could be computed only once.
>>> I have looked at the implementation for JDK and Harmony and am a little
>>> puzzled.
>>>
>>> The trick for the power of two (i.e. if ((n & -n) == n)) is not useful
>>> for the very elaborate generators like Mersenne twister or Well. Both
>>> are proven to be equidistributed even for the low order bits. They are
>>> based on linear recurrences but not linear congruences and do not suffer
>>> from the drawbacks of the latter.
>>>
>>> What puzzles me more is the loop. It is documented as avoiding the
>>> uneven distributions, but at first glance the modulo operation bothers
>>> me. As documentation explicitly states it is designed for this, it is
>>> most probably true, I simply don't understand how yet.
>>>
>>> So our current implementation is slow, then go ahead and change it to
>>> the one you showed me. I would simply suggest to get rid of the ((n &
>>> -n) == n) test. I'll try to understand the condition in the while loop
>>> to understand how it rejects uneven distributions, just out of curiosity
>>> for myself.
>> OK, I finally understood the algorithm and how it rejects the largest
>> incomplete numbers from k*n to (2^31)-1 where k*n is the largest multiple of
>> n that fits in a positive integer. The trick lies in the addition of (n-1)
>> which overflows the integer and wraps the result back to negative values. It
>> is smart.
>>
>> +1 to use it.
> Provided that the algorithm is documented ...


Yeah, I was going to try to decipher it (and the current impl) and
provide some doc.  One other thing to consider in this decision is
do we have to worry about encumberance.  The Harmony impl looks very
similar to what is described in the JDK javadoc.  I wonder if
SunOracle might have claim to it.

Where did you get the current impl, Luc?

Now that we have test infrastructure and Gilles' new tool, I will
run some benchmarks to see what, if anything, the Harmony/JDK impl
buys us.

Phil
>
>> Luc
>>
>>> Luc
>>>
>>>> Luc
>>>>
>>>>
>>>>> Phil
>>>>>> Mersenne twister and Well should be fast for generating large sets, but
>>>>>> most importantly they have very good and *proven* properties
>>>>>> (equidistribution
>>>>>> on large dimensions, null correlation, maximal period ...). These
>>>>>> properties
>>>>>> are essential for example in Monte-Carlo simulations with lots of
>>>>>> variables that
>>>>>> must be independent or have controlled correlations.
>>>>>>
>>>>>> Luc
>>>>>>
>>>>>>> The Harmony impl is almost identical to
>>>>>>> what is documented in the JDK javadoc.
>>>>>>>
>>>>>>> Phil
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [math] Improving tests and performance of RandomGenerator implementations

Reply via email to