I agree with Rob's suggestion. Crowd-sourcing some tests would be great.
The test suites tend to require a version of the algorithm that can be run standalone inside the test harness. The ScRandom() code could be transcribed to a clean C implementation that would be usable. It could be used to spit out testable data streams too, but running the generator is considered superior and Dieharder does it that way. In the case of the NIST tests, data streams can be submitted to the tests or the tests can be modified to run the generator internally. That last should be much faster and allow more extensive tests. Dieharder has this peculiar licensing that has me want a different solution. TestU01 was the strongest test used by Wichmann-Hill, but that also has a non-commercial use condition: <http://www.iro.umontreal.ca/~simardr/testu01/tu01.html>. The original "Diehard Battery of Tests of Randomness" makes some folks nervous because of the Copyright Notice. As far as I can tell, there is no issue in practice. I suppose one needs to be cautious: <http://www.stat.fsu.edu/pub/diehard/>. - Dennis PS: The NIST tests are for cryptographic randomness. Wichmann-Hill #2 and Mersenne Twister should both fail those tests. They are not intended to be cryptographically random, but seeing how they are not is useful to demonstrate. PPS: One feature of Wichmann-Hill #2 that is interesting has to do with row-column independence when employed in some manner in a grid of cells. This characteristic is not directly controllable in Calc, and figuring out how to make it usable might be valuable. One way it is useful is in generating seeds in a way that subsequent sequences are assured to be in different, non-overlapping portions of the very long period. (This means being able to specify seeds and to save them for deriving future ones.) -----Original Message----- From: Pedro Giffuni [mailto:p...@apache.org] Sent: Tuesday, December 18, 2012 11:23 To: Rob Weir; dev@openoffice.apache.org Subject: Re: Blog post ----- Messaggio originale ----- > Da: Rob Weir > > > But would the earlier implementation also pass that same test? > > Two test suites specifically for pseudo random number generators are: > > Dieharder: http://www.phy.duke.edu/~rgb/General/dieharder.php > > and > > this test from NIST: http://csrc.nist.gov/groups/ST/toolkit/rng/index.html > >> The algorithm is known to pass some very strict statistical tests though. >> Indeed I was referring to those two tests. Our new implementation should pass them. My simple understanding of the testing is that you usually don't embed the test in a speadsheet: most testers will generate a sequence of, let's say 1 million numbers and pass them through the tests. > > And that is an important thing, that we are starting with an algorithm > that is already well known. But any given implementation, due to > subtleties of the implementation details, might not do as well as > theory. We see that with crypto all the time. One thing that must be clear is that we don't generate crypto-grade random numbers, that's far more complex than what I intended to do. I was rather more concerned on the "Portable" part of PRNG: it is critical that the RAND() function in Calc doesn't depend on libc's rand() and it's a shame on whomever design the original implementation. The algorithm we are using is not really very fast or has a longest period it just solves a bug ;). > So it might be worth > encouraging some more rigorous testing here. In fact, maybe your blog > post can help recruit some volunteers? Say that we take our numeric > algorithms very seriously, we're looking for the best. This shows in > our use of COIN-MP solver code, but also in our new PRNG. We > encourage anyone who wants to help to put our code through rigorous > tests and help us find any problems, etc. > That is certainly welcome. I hope Andrea is taking notes. We probably should implement Mersenne twister but with some tweaks for the seeding, I just haven't seen the need to spend time on it. :) cheers, Pedro.