I agree with Rob's suggestion.  Crowd-sourcing some tests would be great.  

The test suites tend to require a version of the algorithm that can be run 
standalone inside the test harness.  The ScRandom() code could be transcribed 
to a clean C implementation that would be usable.  It could be used to spit out 
testable data streams too, but running the generator is considered superior and 
Dieharder does it that way.

In the case of the NIST tests, data streams can be submitted to the tests or 
the tests can be modified to run the generator internally.  That last should be 
much faster and allow more extensive tests.

Dieharder has this peculiar licensing that has me want a different solution.  
TestU01 was the strongest test used by Wichmann-Hill, but that also has a 
non-commercial use condition: 
<http://www.iro.umontreal.ca/~simardr/testu01/tu01.html>.

The original "Diehard Battery of Tests of Randomness" makes some folks nervous 
because of the Copyright Notice.  As far as I can tell, there is no issue in 
practice.  I suppose one needs to be cautious: 
<http://www.stat.fsu.edu/pub/diehard/>.

 - Dennis

PS: The NIST tests are for cryptographic randomness.  Wichmann-Hill #2 and 
Mersenne Twister should both fail those tests.  They are not intended to be 
cryptographically random, but seeing how they are not is useful to demonstrate. 
 

PPS: One feature of Wichmann-Hill #2 that is interesting has to do with 
row-column independence when employed in some manner in a grid of cells.  This 
characteristic is not directly controllable in Calc, and figuring out how to 
make it usable might be valuable.  One way it is useful is in generating seeds 
in a way that subsequent sequences are assured to be in different, 
non-overlapping portions of the very long period.  (This means being able to 
specify seeds and to save them for deriving future ones.)

-----Original Message-----
From: Pedro Giffuni [mailto:p...@apache.org] 
Sent: Tuesday, December 18, 2012 11:23
To: Rob Weir; dev@openoffice.apache.org
Subject: Re: Blog post





----- Messaggio originale -----
> Da: Rob Weir

> 
> 
> But would the earlier implementation also pass that same test?
> 
> Two test suites specifically for pseudo random number generators are:
> 
> Dieharder:  http://www.phy.duke.edu/~rgb/General/dieharder.php
> 
> and
> 
> this test from NIST:  http://csrc.nist.gov/groups/ST/toolkit/rng/index.html
> 
>>  The algorithm is known to pass some very strict statistical tests though.
>> 

Indeed I was referring to those two tests. Our new implementation
should pass them.

My simple understanding of the testing is that you usually don't embed the
test in a speadsheet: most testers will generate a sequence of, let's say
1 million numbers and pass them through the tests.

> 
> And that is an important thing, that we are starting with an algorithm
> that is already well known.  But any given implementation, due to
> subtleties of the implementation details, might not do as well as
> theory.  We see that with crypto all the time.

One thing that must be clear is that we don't generate crypto-grade
random numbers, that's far more complex than what I intended to
do. I was rather more concerned on the "Portable" part of PRNG:
it is critical that the RAND() function in Calc doesn't depend on
libc's rand() and it's a shame on whomever design the original
implementation. The algorithm we are using is not really very fast
or has a longest period it just solves a bug ;).


>   So it might be worth
> encouraging some more rigorous testing here.  In fact, maybe your blog
> post can help recruit some volunteers?   Say that we take our numeric
> algorithms very seriously, we're looking for the best.  This shows in
> our use of COIN-MP solver code, but also in our new PRNG.  We
> encourage anyone who wants to help to put our code through rigorous
> tests and help us find any problems, etc.
>

That is certainly welcome. I hope Andrea is taking notes.


We probably should implement Mersenne twister but with some tweaks
for the seeding, I just haven't seen the need to spend time on it. :)

cheers,

Pedro.

Reply via email to