[jira] [Commented] (RNG-86) PractRand

Alex Herbert (Jira) Mon, 28 Oct 2019 06:25:51 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961037#comment-16961037
 ]


Alex Herbert commented on RNG-86:
---------------------------------

h1. PractRand

PractRand is a test suite that can test the output of RNGs. Testing is 
performed using a different approach to Dieharder and TestU01. Those test 
suites choose a list of tests. The RNG output is then used for each test in 
turn. Each test requires an approximately fixed size of output. Thus in general 
each test uses different output from the RNG and the entire test suite takes 
the same length of time irrespective of the RNG.

A run of PractRand must choose the set of tests. The output of the RNG is 
passed to *all* of the tests concurrently and assessed at intervals. The 
intervals successively double until the maximum length of the test is reached 
or failure occurs. The result is that PractRand can fail very fast or run a 
long time. The JDK random fails in 4 seconds; 4TB of output from PCG RS 32 
takes over 12.7 hours when PractRand is run multi-threaded (for an approximate 
speed-up of 3-fold). In contrast the same machine will run BigCrush in 5.2 
hours.

PractRand can test 8, 16, 32 or 64-bit output. The output size of the RNG is 
important as PractRand has the concept of folding where sections of the output 
are extracted into a smaller sequences and assessed.

The folding options test the following least significant bits of the output:
||Option||Output||Folding transforms||
|-tf 0 (no folding)|ignored|none|
|-tf 1 (smart folding)|32-bit|1/32,8/32|
|-tf 1 (smart folding)|64-bit|1/64, 4/64, 16/64|
|-tf 2 (extended folding)|ignored|1/8,1/16,1/32,1/64,4/16,4/32,4/64,8/32,8/64|

The default mode is smart folding (-tf 1).

There are 2 main options for test suites.
||Option||Tests||
|-ts 0|core|
|-ts 1|extended|

The default mode is smart folding (-ts 0). This uses a set of tests which are 
orthogonal, i.e. they test different characteristics of the output. The 
extended test set contains repeats of many of the tests with different 
parameters. It is also variable enough that the author notes that between 
versions it may require a re-run of benchmarks.

The documentation contains notes about the expected run time of the different 
modes: 
{noformat}
normal test set, no folding: 9.6 seconds per GB
normal test set, standard folding: 13.5 seconds per GB (recommended)
normal test set, extra folding: 23 seconds per GB
expanded test set, no folding: 25 seconds per GB
expanded test set, standard folding: 32 seconds per GB
expanded test set, extra folding: 54 seconds per GB
{noformat}
The documentation notes about the extra foldings:
{noformat}
"This may cause excessive amounts of memory to be used on longer data streams"
{noformat}
Example memory usage I have noted using the residual state size of the process 
when run using the core tests (-ts 0):
||Folding||Output (TB)||Mode||Residual State Size (KB)||GB||
|1|1|32-bit| 732792|0.733|
|2|1|32-bit|1271424|1.27|
|1|1|64-bit|800148|0.800|
|2|1|64-bit|1304116|1.30|
|1|4|32-bit|1239440|1.24|
|2|4|32-bit|2314232|2.31|
|1|4|64-bit|1324436|1.32|
|2|4|64-bit|?|?|

So the 64-bit smart folding (-tf 1) requires a bit more memory. The extended 
foldings (-tf 2) use 50% more memory for max 1 TB and nearly 2-fold more memory 
of 4 TB. I do not have a trial for extended folding in 64-bit mode but since 
the mode is ignored (all folding are always done) it will probably be 
approximately the same.
h1. Results

In the following results the number is the size of the output where failure 
occurred expressed as a power of 2. Hence higher is better. No failure is 
marked with a dash.
h2. PractRand v0.93 -tf 2 -ts 1 -tlmax 1TB:

Extended foldings.
Extended tests.
{noformat}
RNG                     PractRand       ∩      
JDK                     19              512 KiB
WELL_512_A              21              2 MiB  
WELL_1024_A             24              16 MiB 
WELL_19937_A            36              64 GiB 
WELL_19937_C            36              64 GiB 
WELL_44497_A            39              512 GiB
WELL_44497_B            39              512 GiB
MT                      32              4 GiB  
ISAAC                   -                      
SPLIT_MIX_64            -                      
XOR_SHIFT_1024_S        28              256 MiB
TWO_CMRES               21              2 MiB  
MT_64                   36              64 GiB 
MWC_256                 -                      
KISS                    -                      
XOR_SHIFT_1024_S_PHI    30              1 GiB  
XO_RO_SHI_RO_64_S       18              256 KiB
XO_RO_SHI_RO_64_SS      -                      
XO_SHI_RO_128_PLUS      21              2 MiB  
XO_SHI_RO_128_SS        -                      
XO_RO_SHI_RO_128_PLUS   22              4 MiB  
XO_RO_SHI_RO_128_SS     -                      
XO_SHI_RO_256_PLUS      24              16 MiB 
XO_SHI_RO_256_SS        -                      
XO_SHI_RO_512_PLUS      27              128 MiB
XO_SHI_RO_512_SS        -                      
PCG_XSH_RR_32           -                      
PCG_XSH_RS_32           -                      
PCG_RXS_M_XS_64         -                      
PCG_MCG_XSH_RR_32       -                      
PCG_MCG_XSH_RS_32       -                      
MSWS                    -                      
SFC_32                  -                      
SFC_64                  -                      
JSF_32                  -                      
JSF_64                  -                      
XO_SHI_RO_128_PP        -                      
XO_RO_SHI_RO_128_PP     -                      
XO_SHI_RO_256_PP        -                      
XO_SHI_RO_512_PP        -                      
XO_RO_SHI_RO_1024_PP    -                      
XO_RO_SHI_RO_1024_S     30              1 GiB  
XO_RO_SHI_RO_1024_SS    -  
{noformat}
This test was aborted from multiple trials. The extended test set with extra 
folding increases run time considerably. It also increases memory usage to the 
extent that parallel jobs on the same machine can run out of memory as each 
task can use >2GB of RAM.
h2. PractRand v0.93 -tf 2 -ts 0 -tlmax 1TB:

Extended foldings.
 Core tests.
{noformat}
RNG                     PractRand       ∩      
JDK                     18,19,19,19,19  512 KiB
WELL_512_A              24,24,24,24,24  16 MiB 
WELL_1024_A             27,27,27,27,27  128 MiB
WELL_19937_A            39,39,39,39,39  512 GiB
WELL_19937_C            39,39,39,39,39  512 GiB
WELL_44497_A            -,-,-,-,-              
WELL_44497_B            -,-,-,-,-              
MT                      35,35,35,35,35  32 GiB 
ISAAC                   -,-,-,-,-              
SPLIT_MIX_64            -,-,-,-,-              
XOR_SHIFT_1024_S        31,31,31,31,31  2 GiB  
TWO_CMRES               21,21,21,21,21  2 MiB  
MT_64                   39,39,39,39,39  512 GiB
MWC_256                 -,-,-,-,-              
KISS                    -,-,-,-,-              
XOR_SHIFT_1024_S_PHI    33,33,33,33,33  8 GiB  
XO_RO_SHI_RO_64_S       21,21,21,21,21  2 MiB  
XO_RO_SHI_RO_64_SS      -,-,-,-,-              
XO_SHI_RO_128_PLUS      24,24,24,24,24  16 MiB 
XO_SHI_RO_128_SS        -,-,-,-,-              
XO_RO_SHI_RO_128_PLUS   25,25,25,25,25  32 MiB 
XO_RO_SHI_RO_128_SS     -,-,-,-,-              
XO_SHI_RO_256_PLUS      27,27,27,27,27  128 MiB
XO_SHI_RO_256_SS        -,-,-,-,-              
XO_SHI_RO_512_PLUS      30,30,30,30,30  1 GiB  
XO_SHI_RO_512_SS        -,-,-,-,-              
PCG_XSH_RR_32           -,-,-,-,-              
PCG_XSH_RS_32           -,-,-,-,-              
PCG_RXS_M_XS_64         -,-,-,-,-              
PCG_MCG_XSH_RR_32       -,-,-,-,-              
PCG_MCG_XSH_RS_32       -,-,-,-,-              
MSWS                    -,-,-,-,-              
SFC_32                  -,-,-,-,-              
SFC_64                  -,-,-,-,-              
JSF_32                  -,-,-,-,-              
JSF_64                  -,-,-,-,-              
XO_SHI_RO_128_PP        -,-,-,-,-              
XO_RO_SHI_RO_128_PP     -,-,-,-,-              
XO_SHI_RO_256_PP        -,-,-,-,-              
XO_SHI_RO_512_PP        -,-,-,-,-              
XO_RO_SHI_RO_1024_PP    -,-,-,-,-              
XO_RO_SHI_RO_1024_S     33,33,33,33,33  8 GiB  
XO_RO_SHI_RO_1024_SS    -,-,-,-,-
{noformat}
This test contains multiple trials. However note that:
 * Almost all failures occurred using foldings that are present in the default 
smart folding set
 * The WELL_44497 generators do not fail. These are known to fail TestU01 
BigCrush.

Here are failures from the extended folding set:
{noformat}
MT                      PractRand       32 GiB                    
MT                      PractRand       [Low4/16]BRank(12):6K(1)  
TWO_CMRES               PractRand       2 MiB                     
TWO_CMRES               PractRand       [Low4/32]Gap-16:A 
{noformat}
h2. PractRand v0.94 -tf 1 -ts 0 -tlmax 4TB:

Smart foldings.
Core tests.

Note: These are the default settings but reduced from 32TB of output to 4TB. 
Using the smart folding and core test set allows the test to execute twice as 
fast so a higher maximum output was possible.

This is a PractRand version switch from v0.93 to v0.94. The v0.94 distribution 
download does not build on linux so I initially tested using v0.93. Then I 
created a patch to fix v0.94 to allow testing using the newer version. A patch 
has been added to the examples-stress src folder.
{noformat}
RNG                     PractRand       ∩      
JDK                     20,20,20        1 MiB  
WELL_512_A              24,24,24        16 MiB 
WELL_1024_A             27,27,27        128 MiB
WELL_19937_A            39,39,39        512 GiB
WELL_19937_C            39,39,39        512 GiB
WELL_44497_A            42,42,42        4 TiB  
WELL_44497_B            42,42,42        4 TiB  
MT                      38,38,38        256 GiB
ISAAC                   -,-,-                  
SPLIT_MIX_64            -,-,-                  
XOR_SHIFT_1024_S        31,31,31        2 GiB  
TWO_CMRES               32,32,32        4 GiB  
MT_64                   39,39,39        512 GiB
MWC_256                 -,-,-                  
KISS                    -,-,-                  
XOR_SHIFT_1024_S_PHI    33,33,33        8 GiB  
XO_RO_SHI_RO_64_S       21,21,21        2 MiB  
XO_RO_SHI_RO_64_SS      -,-,-                  
XO_SHI_RO_128_PLUS      24,24,24        16 MiB 
XO_SHI_RO_128_SS        -,-,-                  
XO_RO_SHI_RO_128_PLUS   25,25,25        32 MiB 
XO_RO_SHI_RO_128_SS     -,-,-                  
XO_SHI_RO_256_PLUS      27,27,27        128 MiB
XO_SHI_RO_256_SS        -,-,-                  
XO_SHI_RO_512_PLUS      30,30,30        1 GiB  
XO_SHI_RO_512_SS        -,-,-                  
PCG_XSH_RR_32           -,-,-                  
PCG_XSH_RS_32           41,-,-                 
PCG_RXS_M_XS_64         -,-,-                  
PCG_MCG_XSH_RR_32       -,-,-                  
PCG_MCG_XSH_RS_32       40,41,41        2 TiB  
MSWS                    -,-,-                  
SFC_32                  -,-,-                  
SFC_64                  -,-,-                  
JSF_32                  -,-,-                  
JSF_64                  -,-,-                  
XO_SHI_RO_128_PP        -,-,-                  
XO_RO_SHI_RO_128_PP     -,-,-                  
XO_SHI_RO_256_PP        -,-,-                  
XO_SHI_RO_512_PP        -,-,-                  
XO_RO_SHI_RO_1024_PP    -,-,-                  
XO_RO_SHI_RO_1024_S     33,33,33        8 GiB  
XO_RO_SHI_RO_1024_SS    -,-,-           
{noformat}
Note that with the switch to v0.94 the core test suite was updated. The test 
where JDK failed in v0.93 has been removed. Thus JDK gets to twice the output 
before failing a different test (still at 4 seconds). This test was replaced by 
a new test that mainly targets LCGs. Interestingly this makes PCG XSH RS fail 
once at 2TB. The variant PCG MCG XSH RS systematically fails at 2TB.

Also note that the failures that occurred on extended folding are now observed 
later on the smart foldings:
{noformat}
MT           32 GiB -> 256 GiB
TWO_CMRES     2 MiB ->   4 GiB
{noformat}
In particular the TWO_CMRES gets 2000 times as much output before failing. The 
output size is still relatively small.

The WELL_44497 generators now fail identifying problems that are found by 
TestU01 BigCrush.
h1. Conclusion

I tried a few variants of testing. Using extended folding was much slower and 
did not fail many generators that would not fail anyway on the core foldings. 
Using the extended test suite is not recommended by the author. The tests are 
more comprehensive but are not orthogonal. There is a lot of overlap. The extra 
tests make the run-time longer and significantly increase the memory overhead. 
The extra tests only seem to make the MT generator fail much faster. Using the 
smart folding with the core tests allow running to 4TB of output in the same 
length of time as 1TB with extended folding and extra tests. At 4TB we see 
failures of the WELL_44497 generators and PCG_MCG_XSH_RS so this extra length 
is useful.

I only did 3 runs. Computation time is large compared to TestU01 BigCrush. In 
contrast to TestU01 the entire output of the generator is put through every 
test. I believe it is a better use of resources to try an extend the run-time 
to longer output rather than to include more runs at the same length. The seed 
becomes largely irrelevant when 4TB of output is generated (e.g. 2^36 long 
values). So I can commit the current results to the user guide and then spend a 
few weeks running PractRand to 32TB output for a single trial run to see what 
happens.
h1. Stats

The total testing time for each of the test suites in the current set of 
results are:
||Tests||Total||Cores||Notes||
|Dieharder|13 days 19:05:49.59|2|Workstation 1 |
|TestU01 BigCrush|47 days 00:50:11.79|2|Workstation 1 |
|PractRand|35 days 12:43:34.94|4|Workstation 1 for some generators; Workstation 
2 for the other generators. 
  
 The second machine runs approximately 2-fold faster than workstation 1. This 
may be due to improved hardware or the switch form g++ 4 to g++ 5 which updates 
support from c++11 to c++14.|

Note that to workaround the high memory usage of PractRand (which uses up to 
1.3GB per job when run to 4TB of output on default settings) I use the 
multi-threaded mode of PractRand. It can use up to 5 threads but usage is 
variable and there is an approximate 3-fold increase in speed when 
multi-threaded. Thus results have been generated using 3 threads per testing 
process. This allows long running parallel jobs on a benchmarking machine to 
not saturate memory.

 

> PractRand
> ---------
>
>                 Key: RNG-86
>                 URL: https://issues.apache.org/jira/browse/RNG-86
>             Project: Commons RNG
>          Issue Type: Wish
>          Components: examples
>            Reporter: Gilles Sadowski
>            Priority: Minor
>
> Integrate another test suite to the {{RandomStressTester}} application:
>  [http://pracrand.sourceforge.net/]
> The library also contains many RNG implementations (C++).
> FTR: https://markmail.org/message/74zmora4jrhwb5hu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (RNG-86) PractRand

Reply via email to