Hi Peter, 

> The long answer will follow.
here it is:
LON-CAPA uses Math::Random for random number functionality. Many of the 
(documented) functions for authors - which are essentially wrappers around 
Math::Random functions - require the author to pass in a seed. This seed is 
actually used as a phrase passed to Math::Random::set_seed_from_phrase(). 
set_seed_from_phrase() then generates the real seed from this phrase which is 
used to initialise the random number generator (RNG). The real seed consists of 
two integer components and represents the initial state of a linear 
congruential generator. 
The way random number functionality is offered to authors in LON-CAPA requires 
to pass in a seed every time one of the functions is used. This is different to 
common usage of RNGs: the RNG is initialised only once and then generates 
numbers according to the promised distribution. So in LON-CAPA we effectively 
use the second state of a RNG every time we request RNG functions. Because of 
the way LCGs work, the distribution of “second states” depends on the 
distribution of initial states - garbage in, garbage out applies again. 

With this in mind, let’s look at the data. The following graphs show density 
histograms of the two components of the seeds that are generated by 
set_seed_from_phrase(). Blue means low frequency, yellowish means high 
frequency of the respective combination.

The first histogram depicts the seed frequency when using your first approach 
($seed = random(1,1E10,1)) for 10^6 samples.

[removed: it seems that the mailing list doesn’t support attached images. see 
http://public.ostfalia.de/~droescst/seedhist1.png]

One can clearly see that there’s an imbalance or bias. The desired/ideal result 
would be an area of equal color, corresponding to a uniform distribution in 
both components. This explains your observations.

If you look at phrtsd in randlib.c (the underlying library of Math::Random, 
phrtsd being the function phrase -> seed), you’ll find that the function 
expects the phrase to be a string of @SEEDCHARS (see previous mail). If you 
don’t use all of the characters (e.g. only decimals), you’ll end up with the 
situation depicted in the first histogram. 

This problem is easy to solve: create your phrase from a random sampling of all 
the available chars. The second histogram depicts this approach (10^6 samples), 
which corresponds to the code I suggested earlier. One can still see a slight 
imbalance at the “borders”, which I can not explain at the moment. But the 
overall situation is much better. 

[removed: it seems that the mailing list doesn’t support attached images. see 
http://public.ostfalia.de/~droescst/seedhist2.png]

Personally, I would not use random_permutation if a uniform distribution of 
permutations is a requirement (which in most use cases is not), because of the 
issues with “seed management”. Instead I’d use something along the lines of 
your shuffle or this:

sub permute2{
    my @list = @_;
    my @ret;
    while(scalar @list){
        push @ret, splice(@list, random(0,scalar @list,1), 1);
    }
    return @ret;
}  

which should be a little more efficient. 

Stefan

On 01 Sep 2014, at 12:45, Stefan Dröschler <st.droesch...@ostfalia.de> wrote:

> Hi Peter, 
> 
>> The latter shows a remarkable imbalance. 
>> This seems to be a bias as it remains stable if choosing other randomisation.
> The short answer is: the RNG is not well-fed the way you generate the seeds.
> The bias results from the distribution of the “effective seeds”. 
> 
> A better albeit not perfect (!!) way would be to add:
> 
> my @SEEDCHARS = split "", 
> q{abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()_+[];:'\"<>?,./};
>  
> sub gen_seed {
>    my $ret = "";
>    $ret .= $SEEDCHARS[random(0,$#SEEDCHARS, 1)] for (1..20); 
>    return $ret;
> }   
> 
> and replace    
> my $seed = random( 1, 1E10, 1 ); 
> with
> my $seed = gen_seed();
> 
> You’ll notice that the resulting permutations are much more uniformly 
> distributed. (And you also might run into “code ran too long”).
> Note: this approach is using knowledge about the implementation of the 
> underlying libraries.
> 
> The long answer will follow.
> 
> Stefan
> 
> 
> On 31 Aug 2014, at 17:52, Peter Dencker <denc...@math.uni-luebeck.de> wrote:
> 
>> 
>> Hi.
>> 
>> The problem given below compares two procedures to permute lists: sub
>> 'shuffle' is a Fisher-Yates-shuffle procedure and sub 'permute' directly
>> uses the script function 'random_permutation'. The latter shows a
>> remarkable imbalance. This seems to be a bias as it remains stable if
>> choosing other randomization.
>> 
>> All my tries to choose the $seed give a similar behavior.
>> What is the right method?
>> 
>> - Peter
>> 
>> <problem>
>> 
>> <script type="loncapa/perl">
>> 
>> our $N = 10000;
>> 
>> sub shuffle {
>>   my @values = @_;
>>   return
>>       if !@values;
>> 
>>   for my $i ( 0 .. $#values - 1 ) {
>>       my $j = random( $i, $#values, 1 );
>>       @values[ $i, $j ] = @values[ $j, $i ];
>>   }
>>   return @values;
>> }
>> 
>> sub permute {
>>   my @values = @_;
>>   return
>>       if !@values;
>> 
>>   my $seed = random( 1, 1E10, 1 );
>>   return random_permutation( $seed, @values );
>> }
>> 
>> our $tables;
>> for ( 'B' .. 'E' ) {
>>   my @items = 'A' .. $_;
>>   $tables .= '$N random permutations of '
>>           . ( join q{}, @items ) . ' counted: <br />';
>>   my %number;
>>   for ( 1 .. $N ) {
>>       $number{'permute'}{ join q{}, permute @items }++;
>>       $number{'shuffle'}{ join q{}, shuffle @items }++;
>>   }
>>   $tables
>>     .= '<table border=1><tr><td>permutation</td>'
>>     . '<td>shuffle</td>'
>>     . '<td>permute</td></tr>'
>>     . '<tr><td align="center">'
>>     . join '</td></tr><tr><td align="center">', map {
>>         join '</td><td align="right">',
>>              $_, $number{'shuffle'}{$_},
>>              $number{'permute'}{$_};
>>       } sort { $a cmp $b }
>>         keys %{ $number{'shuffle'} };
>>   $tables .= '</td></tr></table>';
>>   $tables .= '<br />' x 2;
>> }
>> 
>> </script>
>> 
>> $tables
>> 
>> <customresponse id="0r0">
>> <answer type="loncapa/perl">
>>     return 'SUBMITTED';
>> </answer>
>> </customresponse>
>> 
>> </problem>
>> 
>> -- 
>> Dr. Peter Dencker
>>   wissenschaftl. Mitarbeiter
>> 
>> UNIVERSITÄT ZU LÜBECK
>>   INSTITUT FÜR MATHEMATIK
>> 
>>   Ratzeburger Allee 160
>>   23562 Lübeck
>> 
>>   Tel +49 451 500 4254
>>   Fax +49 451 500 3373
>>   denc...@math.uni-luebeck.de
>> 
>>   www.math.uni-luebeck.de
>> _______________________________________________
>> LON-CAPA-users mailing list
>> LON-CAPA-users@mail.lon-capa.org
>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-users
> 
> _______________________________________________
> LON-CAPA-users mailing list
> LON-CAPA-users@mail.lon-capa.org
> http://mail.lon-capa.org/mailman/listinfo/lon-capa-users

_______________________________________________
LON-CAPA-users mailing list
LON-CAPA-users@mail.lon-capa.org
http://mail.lon-capa.org/mailman/listinfo/lon-capa-users

Reply via email to