[
https://issues.apache.org/jira/browse/RNG-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961037#comment-16961037
]
Alex Herbert commented on RNG-86:
---------------------------------
h1. PractRand
PractRand is a test suite that can test the output of RNGs. Testing is
performed using a different approach to Dieharder and TestU01. Those test
suites choose a list of tests. The RNG output is then used for each test in
turn. Each test requires an approximately fixed size of output. Thus in general
each test uses different output from the RNG and the entire test suite takes
the same length of time irrespective of the RNG.
A run of PractRand must choose the set of tests. The output of the RNG is
passed to *all* of the tests concurrently and assessed at intervals. The
intervals successively double until the maximum length of the test is reached
or failure occurs. The result is that PractRand can fail very fast or run a
long time. The JDK random fails in 4 seconds; 4TB of output from PCG RS 32
takes over 12.7 hours when PractRand is run multi-threaded (for an approximate
speed-up of 3-fold). In contrast the same machine will run BigCrush in 5.2
hours.
PractRand can test 8, 16, 32 or 64-bit output. The output size of the RNG is
important as PractRand has the concept of folding where sections of the output
are extracted into a smaller sequences and assessed.
The folding options test the following least significant bits of the output:
||Option||Output||Folding transforms||
|-tf 0 (no folding)|ignored|none|
|-tf 1 (smart folding)|32-bit|1/32,8/32|
|-tf 1 (smart folding)|64-bit|1/64, 4/64, 16/64|
|-tf 2 (extended folding)|ignored|1/8,1/16,1/32,1/64,4/16,4/32,4/64,8/32,8/64|
The default mode is smart folding (-tf 1).
There are 2 main options for test suites.
||Option||Tests||
|-ts 0|core|
|-ts 1|extended|
The default mode is smart folding (-ts 0). This uses a set of tests which are
orthogonal, i.e. they test different characteristics of the output. The
extended test set contains repeats of many of the tests with different
parameters. It is also variable enough that the author notes that between
versions it may require a re-run of benchmarks.
The documentation contains notes about the expected run time of the different
modes:
{noformat}
normal test set, no folding: 9.6 seconds per GB
normal test set, standard folding: 13.5 seconds per GB (recommended)
normal test set, extra folding: 23 seconds per GB
expanded test set, no folding: 25 seconds per GB
expanded test set, standard folding: 32 seconds per GB
expanded test set, extra folding: 54 seconds per GB
{noformat}
The documentation notes about the extra foldings:
{noformat}
"This may cause excessive amounts of memory to be used on longer data streams"
{noformat}
Example memory usage I have noted using the residual state size of the process
when run using the core tests (-ts 0):
||Folding||Output (TB)||Mode||Residual State Size (KB)||GB||
|1|1|32-bit| 732792|0.733|
|2|1|32-bit|1271424|1.27|
|1|1|64-bit|800148|0.800|
|2|1|64-bit|1304116|1.30|
|1|4|32-bit|1239440|1.24|
|2|4|32-bit|2314232|2.31|
|1|4|64-bit|1324436|1.32|
|2|4|64-bit|?|?|
So the 64-bit smart folding (-tf 1) requires a bit more memory. The extended
foldings (-tf 2) use 50% more memory for max 1 TB and nearly 2-fold more memory
of 4 TB. I do not have a trial for extended folding in 64-bit mode but since
the mode is ignored (all folding are always done) it will probably be
approximately the same.
h1. Results
In the following results the number is the size of the output where failure
occurred expressed as a power of 2. Hence higher is better. No failure is
marked with a dash.
h2. PractRand v0.93 -tf 2 -ts 1 -tlmax 1TB:
Extended foldings.
Extended tests.
{noformat}
RNG PractRand ∩
JDK 19 512 KiB
WELL_512_A 21 2 MiB
WELL_1024_A 24 16 MiB
WELL_19937_A 36 64 GiB
WELL_19937_C 36 64 GiB
WELL_44497_A 39 512 GiB
WELL_44497_B 39 512 GiB
MT 32 4 GiB
ISAAC -
SPLIT_MIX_64 -
XOR_SHIFT_1024_S 28 256 MiB
TWO_CMRES 21 2 MiB
MT_64 36 64 GiB
MWC_256 -
KISS -
XOR_SHIFT_1024_S_PHI 30 1 GiB
XO_RO_SHI_RO_64_S 18 256 KiB
XO_RO_SHI_RO_64_SS -
XO_SHI_RO_128_PLUS 21 2 MiB
XO_SHI_RO_128_SS -
XO_RO_SHI_RO_128_PLUS 22 4 MiB
XO_RO_SHI_RO_128_SS -
XO_SHI_RO_256_PLUS 24 16 MiB
XO_SHI_RO_256_SS -
XO_SHI_RO_512_PLUS 27 128 MiB
XO_SHI_RO_512_SS -
PCG_XSH_RR_32 -
PCG_XSH_RS_32 -
PCG_RXS_M_XS_64 -
PCG_MCG_XSH_RR_32 -
PCG_MCG_XSH_RS_32 -
MSWS -
SFC_32 -
SFC_64 -
JSF_32 -
JSF_64 -
XO_SHI_RO_128_PP -
XO_RO_SHI_RO_128_PP -
XO_SHI_RO_256_PP -
XO_SHI_RO_512_PP -
XO_RO_SHI_RO_1024_PP -
XO_RO_SHI_RO_1024_S 30 1 GiB
XO_RO_SHI_RO_1024_SS -
{noformat}
This test was aborted from multiple trials. The extended test set with extra
folding increases run time considerably. It also increases memory usage to the
extent that parallel jobs on the same machine can run out of memory as each
task can use >2GB of RAM.
h2. PractRand v0.93 -tf 2 -ts 0 -tlmax 1TB:
Extended foldings.
Core tests.
{noformat}
RNG PractRand ∩
JDK 18,19,19,19,19 512 KiB
WELL_512_A 24,24,24,24,24 16 MiB
WELL_1024_A 27,27,27,27,27 128 MiB
WELL_19937_A 39,39,39,39,39 512 GiB
WELL_19937_C 39,39,39,39,39 512 GiB
WELL_44497_A -,-,-,-,-
WELL_44497_B -,-,-,-,-
MT 35,35,35,35,35 32 GiB
ISAAC -,-,-,-,-
SPLIT_MIX_64 -,-,-,-,-
XOR_SHIFT_1024_S 31,31,31,31,31 2 GiB
TWO_CMRES 21,21,21,21,21 2 MiB
MT_64 39,39,39,39,39 512 GiB
MWC_256 -,-,-,-,-
KISS -,-,-,-,-
XOR_SHIFT_1024_S_PHI 33,33,33,33,33 8 GiB
XO_RO_SHI_RO_64_S 21,21,21,21,21 2 MiB
XO_RO_SHI_RO_64_SS -,-,-,-,-
XO_SHI_RO_128_PLUS 24,24,24,24,24 16 MiB
XO_SHI_RO_128_SS -,-,-,-,-
XO_RO_SHI_RO_128_PLUS 25,25,25,25,25 32 MiB
XO_RO_SHI_RO_128_SS -,-,-,-,-
XO_SHI_RO_256_PLUS 27,27,27,27,27 128 MiB
XO_SHI_RO_256_SS -,-,-,-,-
XO_SHI_RO_512_PLUS 30,30,30,30,30 1 GiB
XO_SHI_RO_512_SS -,-,-,-,-
PCG_XSH_RR_32 -,-,-,-,-
PCG_XSH_RS_32 -,-,-,-,-
PCG_RXS_M_XS_64 -,-,-,-,-
PCG_MCG_XSH_RR_32 -,-,-,-,-
PCG_MCG_XSH_RS_32 -,-,-,-,-
MSWS -,-,-,-,-
SFC_32 -,-,-,-,-
SFC_64 -,-,-,-,-
JSF_32 -,-,-,-,-
JSF_64 -,-,-,-,-
XO_SHI_RO_128_PP -,-,-,-,-
XO_RO_SHI_RO_128_PP -,-,-,-,-
XO_SHI_RO_256_PP -,-,-,-,-
XO_SHI_RO_512_PP -,-,-,-,-
XO_RO_SHI_RO_1024_PP -,-,-,-,-
XO_RO_SHI_RO_1024_S 33,33,33,33,33 8 GiB
XO_RO_SHI_RO_1024_SS -,-,-,-,-
{noformat}
This test contains multiple trials. However note that:
* Almost all failures occurred using foldings that are present in the default
smart folding set
* The WELL_44497 generators do not fail. These are known to fail TestU01
BigCrush.
Here are failures from the extended folding set:
{noformat}
MT PractRand 32 GiB
MT PractRand [Low4/16]BRank(12):6K(1)
TWO_CMRES PractRand 2 MiB
TWO_CMRES PractRand [Low4/32]Gap-16:A
{noformat}
h2. PractRand v0.94 -tf 1 -ts 0 -tlmax 4TB:
Smart foldings.
Core tests.
Note: These are the default settings but reduced from 32TB of output to 4TB.
Using the smart folding and core test set allows the test to execute twice as
fast so a higher maximum output was possible.
This is a PractRand version switch from v0.93 to v0.94. The v0.94 distribution
download does not build on linux so I initially tested using v0.93. Then I
created a patch to fix v0.94 to allow testing using the newer version. A patch
has been added to the examples-stress src folder.
{noformat}
RNG PractRand ∩
JDK 20,20,20 1 MiB
WELL_512_A 24,24,24 16 MiB
WELL_1024_A 27,27,27 128 MiB
WELL_19937_A 39,39,39 512 GiB
WELL_19937_C 39,39,39 512 GiB
WELL_44497_A 42,42,42 4 TiB
WELL_44497_B 42,42,42 4 TiB
MT 38,38,38 256 GiB
ISAAC -,-,-
SPLIT_MIX_64 -,-,-
XOR_SHIFT_1024_S 31,31,31 2 GiB
TWO_CMRES 32,32,32 4 GiB
MT_64 39,39,39 512 GiB
MWC_256 -,-,-
KISS -,-,-
XOR_SHIFT_1024_S_PHI 33,33,33 8 GiB
XO_RO_SHI_RO_64_S 21,21,21 2 MiB
XO_RO_SHI_RO_64_SS -,-,-
XO_SHI_RO_128_PLUS 24,24,24 16 MiB
XO_SHI_RO_128_SS -,-,-
XO_RO_SHI_RO_128_PLUS 25,25,25 32 MiB
XO_RO_SHI_RO_128_SS -,-,-
XO_SHI_RO_256_PLUS 27,27,27 128 MiB
XO_SHI_RO_256_SS -,-,-
XO_SHI_RO_512_PLUS 30,30,30 1 GiB
XO_SHI_RO_512_SS -,-,-
PCG_XSH_RR_32 -,-,-
PCG_XSH_RS_32 41,-,-
PCG_RXS_M_XS_64 -,-,-
PCG_MCG_XSH_RR_32 -,-,-
PCG_MCG_XSH_RS_32 40,41,41 2 TiB
MSWS -,-,-
SFC_32 -,-,-
SFC_64 -,-,-
JSF_32 -,-,-
JSF_64 -,-,-
XO_SHI_RO_128_PP -,-,-
XO_RO_SHI_RO_128_PP -,-,-
XO_SHI_RO_256_PP -,-,-
XO_SHI_RO_512_PP -,-,-
XO_RO_SHI_RO_1024_PP -,-,-
XO_RO_SHI_RO_1024_S 33,33,33 8 GiB
XO_RO_SHI_RO_1024_SS -,-,-
{noformat}
Note that with the switch to v0.94 the core test suite was updated. The test
where JDK failed in v0.93 has been removed. Thus JDK gets to twice the output
before failing a different test (still at 4 seconds). This test was replaced by
a new test that mainly targets LCGs. Interestingly this makes PCG XSH RS fail
once at 2TB. The variant PCG MCG XSH RS systematically fails at 2TB.
Also note that the failures that occurred on extended folding are now observed
later on the smart foldings:
{noformat}
MT 32 GiB -> 256 GiB
TWO_CMRES 2 MiB -> 4 GiB
{noformat}
In particular the TWO_CMRES gets 2000 times as much output before failing. The
output size is still relatively small.
The WELL_44497 generators now fail identifying problems that are found by
TestU01 BigCrush.
h1. Conclusion
I tried a few variants of testing. Using extended folding was much slower and
did not fail many generators that would not fail anyway on the core foldings.
Using the extended test suite is not recommended by the author. The tests are
more comprehensive but are not orthogonal. There is a lot of overlap. The extra
tests make the run-time longer and significantly increase the memory overhead.
The extra tests only seem to make the MT generator fail much faster. Using the
smart folding with the core tests allow running to 4TB of output in the same
length of time as 1TB with extended folding and extra tests. At 4TB we see
failures of the WELL_44497 generators and PCG_MCG_XSH_RS so this extra length
is useful.
I only did 3 runs. Computation time is large compared to TestU01 BigCrush. In
contrast to TestU01 the entire output of the generator is put through every
test. I believe it is a better use of resources to try an extend the run-time
to longer output rather than to include more runs at the same length. The seed
becomes largely irrelevant when 4TB of output is generated (e.g. 2^36 long
values). So I can commit the current results to the user guide and then spend a
few weeks running PractRand to 32TB output for a single trial run to see what
happens.
h1. Stats
The total testing time for each of the test suites in the current set of
results are:
||Tests||Total||Cores||Notes||
|Dieharder|13 days 19:05:49.59|2|Workstation 1 |
|TestU01 BigCrush|47 days 00:50:11.79|2|Workstation 1 |
|PractRand|35 days 12:43:34.94|4|Workstation 1 for some generators; Workstation
2 for the other generators.
The second machine runs approximately 2-fold faster than workstation 1. This
may be due to improved hardware or the switch form g++ 4 to g++ 5 which updates
support from c++11 to c++14.|
Note that to workaround the high memory usage of PractRand (which uses up to
1.3GB per job when run to 4TB of output on default settings) I use the
multi-threaded mode of PractRand. It can use up to 5 threads but usage is
variable and there is an approximate 3-fold increase in speed when
multi-threaded. Thus results have been generated using 3 threads per testing
process. This allows long running parallel jobs on a benchmarking machine to
not saturate memory.
> PractRand
> ---------
>
> Key: RNG-86
> URL: https://issues.apache.org/jira/browse/RNG-86
> Project: Commons RNG
> Issue Type: Wish
> Components: examples
> Reporter: Gilles Sadowski
> Priority: Minor
>
> Integrate another test suite to the {{RandomStressTester}} application:
> [http://pracrand.sourceforge.net/]
> The library also contains many RNG implementations (C++).
> FTR: https://markmail.org/message/74zmora4jrhwb5hu
--
This message was sent by Atlassian Jira
(v8.3.4#803005)