Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-28 Thread Matti Nykyri
On Jun 28, 2014, at 0:13, Matti Nykyri matti.nyk...@iki.fi wrote:

 On Jun 27, 2014, at 0:00, Kai Krakow hurikha...@gmail.com wrote:
 
 Matti Nykyri matti.nyk...@iki.fi schrieb:
 
 If you are looking a mathematically perfect solution there is a simple
 one even if your list is not in the power of 2! Take 6 bits at a time of
 the random data. If the result is 62 or 63 you will discard the data and
 get the next 6 bits. This selectively modifies the random data but keeps
 the probabilities in correct balance. Now the probability for index of
 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
 
 Why not do just something like this?
 
 index = 0;
 while (true) {
 index = (index + get_6bit_random()) % 62;
 output  char_array[index];
 }
 
 Done, no bits wasted. Should have perfect distribution also. We also don't 
 have to throw away random data just to stay within unaligned boundaries. The 
 unalignment is being taken over into the next loop so the error corrects 
 itself over time (it becomes distributed over the whole set).
 
 Distribution will not be perfect. The same original problem persists. 
 Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. 
 Now the addition changes this so that index 0 to 1 reflects to previous 
 character and not the original index.
 
 The distribution of like 10GB of data should be quite even but not on a small 
 scale. The next char will depend on previous char. It is 100% more likely 
 that the next char is the same or one index above the previous char then any 
 of the other ones in the series. So it is likely that you will have long sets 
 of same character.
 
 Random means that for next char the probability is always even, 1/62. And 
 like mentioned in Dilbert it is impossible to say that something is random 
 but possible to say that it isn't.
 
 If wasting 6bit of data seems large, do this:
 
 index = get_6bit_random();
 while (index  61) {
 index = 1;
 index |= get_1bit_random();
 index = 0x3F;
 }
 return index;
 
 It will waste 1 bit at a time until result is less than 62. This will 
 slightly change probabilities though :/

Sorry this example is really flawed :( If next6bit is over 61 there are only 
two possible values for it: 62 or 63 - that is 0x3E and 0x3F. So you see that 
only one bit changes. But that bit is random! So least significant bit is 
random and does not need to be discarded :)

index = get_6bit_random();
while (index  61) {
index = 5;
index |= get_5bit_random();
index = 0x3F;
}
return index;




[gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-28 Thread »Q«
On Sat, 28 Jun 2014 19:53:08 -0500
Canek Peláez Valdés can...@gmail.com wrote:

 On Sat, Jun 28, 2014 at 7:37 PM,  gottl...@nyu.edu wrote:
  On Sat, Jun 28 2014, Canek Peláez Valdés wrote:
 
  That doesn't matter. Take a non-negative integer N; if you flip a
  coin an infinite number of times, then the probability of the coin
  landing on the same face N times in a row is 1.
 
  This is certainly true.
 
  This means that it is *guaranteed* to happen
 
  That is not as clear.
 
 Let me be more precise (and please correct me if I'm wrong): It is
 guaranteed to happen at some point in the infinite sequence of random
 flip coins, but we cannot know when it will happen, only that it will
 happen.
 
 That's the way I got it when I took my probability courses, admittedly
 many years ago.

The probability is 1 in the sense that the as the number of flips M
increases, so does the probability of getting N heads (or tails) in a
row also increases, and the upper bound for the sequence of
probabilities is 1.  It's not a probability about something which
actually happens;  no one so far has been able to flip a coin an
infinite number of times, not even a computer.

 In any way, even if I'm wrong and it is not guaranteed, the main point
 remains true: the probability of getting a large sequence of the same
 number from a RNG is 1 for every true random RNG, and therefore seeing
 a large sequence of the same number form a RNG doesn't (technically)
 means that it is broken.

It's true that that wouldn't *prove* the generator is broken.  But it
might be a good reason to take another look at the algorithm.
 
 Regards.






Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-28 Thread Canek Peláez Valdés
On Sat, Jun 28, 2014 at 8:46 PM, »Q« boxc...@gmx.net wrote:
 On Sat, 28 Jun 2014 19:53:08 -0500
 Canek Peláez Valdés can...@gmail.com wrote:

 On Sat, Jun 28, 2014 at 7:37 PM,  gottl...@nyu.edu wrote:
  On Sat, Jun 28 2014, Canek Peláez Valdés wrote:
 
  That doesn't matter. Take a non-negative integer N; if you flip a
  coin an infinite number of times, then the probability of the coin
  landing on the same face N times in a row is 1.
 
  This is certainly true.
 
  This means that it is *guaranteed* to happen
 
  That is not as clear.

 Let me be more precise (and please correct me if I'm wrong): It is
 guaranteed to happen at some point in the infinite sequence of random
 flip coins, but we cannot know when it will happen, only that it will
 happen.

 That's the way I got it when I took my probability courses, admittedly
 many years ago.

 The probability is 1 in the sense that the as the number of flips M
 increases, so does the probability of getting N heads (or tails) in a
 row also increases, and the upper bound for the sequence of
 probabilities is 1.  It's not a probability about something which
 actually happens;  no one so far has been able to flip a coin an
 infinite number of times, not even a computer.

And no one will. Ever.

 In any way, even if I'm wrong and it is not guaranteed, the main point
 remains true: the probability of getting a large sequence of the same
 number from a RNG is 1 for every true random RNG, and therefore seeing
 a large sequence of the same number form a RNG doesn't (technically)
 means that it is broken.

 It's true that that wouldn't *prove* the generator is broken.  But it
 might be a good reason to take another look at the algorithm.

Agreed.

Regards.
-- 
Canek Peláez Valdés
Profesor de asignatura, Facultad de Ciencias
Universidad Nacional Autónoma de México



Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-27 Thread thegeezer
On 06/26/2014 11:07 PM, Kai Krakow wrote:

 It is worth noting that my approach has the tendency of generating random 
 characters in sequence.

sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/



Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-27 Thread Matti Nykyri
 On Jun 27, 2014, at 11:55, thegeezer thegee...@thegeezer.net wrote:
 
 On 06/26/2014 11:07 PM, Kai Krakow wrote:
 
 It is worth noting that my approach has the tendency of generating random 
 characters in sequence.
 
 sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/
 

This is a good one :) have really been thinking this same comic previosly when 
writing to this thread...


Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-27 Thread Matti Nykyri
 On Jun 27, 2014, at 0:00, Kai Krakow hurikha...@gmail.com wrote:
 
 Matti Nykyri matti.nyk...@iki.fi schrieb:
 
 If you are looking a mathematically perfect solution there is a simple
 one even if your list is not in the power of 2! Take 6 bits at a time of
 the random data. If the result is 62 or 63 you will discard the data and
 get the next 6 bits. This selectively modifies the random data but keeps
 the probabilities in correct balance. Now the probability for index of
 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
 
 Why not do just something like this?
 
 index = 0;
 while (true) {
  index = (index + get_6bit_random()) % 62;
  output  char_array[index];
 }
 
 Done, no bits wasted. Should have perfect distribution also. We also don't 
 have to throw away random data just to stay within unaligned boundaries. The 
 unalignment is being taken over into the next loop so the error corrects 
 itself over time (it becomes distributed over the whole set).

Distribution will not be perfect. The same original problem persists. 
Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. Now 
the addition changes this so that index 0 to 1 reflects to previous character 
and not the original index.

The distribution of like 10GB of data should be quite even but not on a small 
scale. The next char will depend on previous char. It is 100% more likely that 
the next char is the same or one index above the previous char then any of the 
other ones in the series. So it is likely that you will have long sets of same 
character.

Random means that for next char the probability is always even, 1/62. And like 
mentioned in Dilbert it is impossible to say that something is random but 
possible to say that it isn't.

If wasting 6bit of data seems large, do this:

index = get_6bit_random();
while (index  61) {
 index = 1;
 index |= get_1bit_random();
 index = 0x3F;
}
return index;

It will waste 1 bit at a time until result is less than 62. This will slightly 
change probabilities though :/


[gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-26 Thread Kai Krakow
Matti Nykyri matti.nyk...@iki.fi schrieb:

 If you are looking a mathematically perfect solution there is a simple
 one even if your list is not in the power of 2! Take 6 bits at a time of
 the random data. If the result is 62 or 63 you will discard the data and
 get the next 6 bits. This selectively modifies the random data but keeps
 the probabilities in correct balance. Now the probability for index of
 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.

Why not do just something like this?

index = 0;
while (true) {
  index = (index + get_6bit_random()) % 62;
  output  char_array[index];
}

Done, no bits wasted. Should have perfect distribution also. We also don't 
have to throw away random data just to stay within unaligned boundaries. The 
unalignment is being taken over into the next loop so the error corrects 
itself over time (it becomes distributed over the whole set).

-- 
Replies to list only preferred.




[gentoo-user] Re: OT: Mapping random numbers (PRNG)

2014-06-26 Thread Kai Krakow
Kai Krakow hurikha...@gmail.com schrieb:

 Matti Nykyri matti.nyk...@iki.fi schrieb:
 
 If you are looking a mathematically perfect solution there is a simple
 one even if your list is not in the power of 2! Take 6 bits at a time of
 the random data. If the result is 62 or 63 you will discard the data and
 get the next 6 bits. This selectively modifies the random data but keeps
 the probabilities in correct balance. Now the probability for index of
 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
 
 Why not do just something like this?
 
 index = 0;
 while (true) {
   index = (index + get_6bit_random()) % 62;
   output  char_array[index];
 }
 
 Done, no bits wasted. Should have perfect distribution also. We also don't
 have to throw away random data just to stay within unaligned boundaries.
 The unalignment is being taken over into the next loop so the error
 corrects itself over time (it becomes distributed over the whole set).

It is worth noting that my approach has the tendency of generating random 
characters in sequence. I think there are several possibilities to fix this, 
one being swapping two elements in the char array on each access. That fixed 
this particular problem in my tests. Here's are quick ruby snippet I played 
with:

$ cat random.rb
# danger: ugly code ahead 
dist = [0] * 62
seq = [0, 0]
chars = [*('A'..'Z'), *('a'..'z'), *('0'..'9')]
orig_chars = chars.dup

index = 0
old = nil
(62 * 10).times do
  # generate random wrapped index
  index = (index + rand(64)) % 62

  # swap two indexed positions and output the char
  chars[0], chars[index] = chars[index], chars[0]
  $stdout.write chars[0]

  # count the generated indexes
  dist[index] = dist[index] + 1

  # count if indexes were generated in sequence by looking
  # them up in the original sorted chars array
  unless old.nil?
seq[0] = seq[0] + 1 if old  orig_chars.index(chars[0])
seq[1] = seq[1] + 1 if old  orig_chars.index(chars[0])
  end
  old = orig_chars.index(chars[0])
end
puts

# check the average, it should always equal the loop count
# multiplicator
avg = dist.inject(:+) / 62.0

# calculate if the algorithm preferred sequences
seq = [*seq, seq[0] - seq[1]]

# output some stats
puts avg
puts dist.inspect
puts dist.map {|v| avg - v }.inspect
puts seq.inspect


-- 
Replies to list only preferred.