Here is my solution:
For a simple case, I'll use an example with N=3 and the probability
distribution {0.25, 0.3, 0.45}.
I want my event generator to produce 0 with probability 0.25, 1 with
probability 0.3, and 2 with probability 0.45.
I build a table of size N, with each cell being divided into two parts
by a partition value:
Table Index Left Partition Right
0 0 0.75 2
1 1 0.9 2
2 2 1.0 2
To generate an event from the desired distribution, I randomly pick
one of the three cells, and then generate a fraction on the range
0..1. If the fraction is less than that cell's partition, I return the
left value. Otherwise I return the right value.
You can verify that this results in the desired probabilities: We want
event 0 to occur with probability 0.25. Event 0 will occur when we
select cell 0 and the fraction value is less than 0.75. The
probability of that occurring is 1/3 * 3/4 = 1/4. For event 2 you have
to add up pieces from all three cells: 1/3*1/4 + 1/3*1/10 + 1/3*1 =
0.45.
Building the table is the only complicated thing that I have not yet
described. The objective is to have each event x occupying portions of
cells totaling N*px. I start off by assigning the left part of cell x
to event x and setting the partition to N*px. Then I pair up cells
where the partition is more than one with cells where the partition is
less than one, but which don't yet have a right component. I assign
the unused part of the second cell to the event of the first cell and
subtract that amount from the partition of the first cell. If you
continue that process at most N times, you get a valid table which
will produce the desired results. To avoid having to search through
the table to find cells which have excess or can take excess (which
would make the initilization O(N^2) or even O(N^3)), I use two stacks,
one to keep track of each kind of cell.
As an optimization to the generation process above, I actually store
the partition as an integer, 2^31-1 times the value shown above. Then
I just need to compare the random integer value to that value to
determine if I will return the left event or the right event. It
avoids one division and makes the comparison an integer compare rather
than a double precision floating point compare.
I have used this in simulation applications involving hundreds or even
thousands of possible events, and it works very nicely.
Don
On Sep 14, 9:32 am, Don <[email protected]> wrote:
> Given a pseudo-random generator which produces a uniform distribution
> from the range 0..2^32-1, build a generator which produces discrete
> samples from the integers 0..N-1 according to a user-specified
> probability distribution {P0, P1, P2, ... PN-1} (where Px is the
> probability of x being generated on any draw). Initialization of the
> generator may be of time complexity O(N) and the generator may have
> memory use of O(N), but each draw must be constant time O(1).
>
> Don
--
You received this message because you are subscribed to the Google Groups
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.