Glen Barnett wrote:
[...]
> Note that since the width of wach bar is 1, this still has total
That should read "width of *each* bar"
> [Note that if you use table lookup, there's *two* leftover parts you
> need to generate - the tops of each bit of pmf not covered by the table,
> as well as the extreme right tail.]
Hmm, that's confusing.
What I mean is that the table lookup approach lets you generate from
an approximation to a piece of the discrete distribution over a finite
range - cut off to the right (so it generates values between 1 and k),
and with a tiny bit cut off the top. The cut-off bits need to be
generated
from separately.
Something like this. Let r be the probability to the right of k, for
some k.
(in what follows I write as if vectors, v(i) begin at v(1). If they
index from zero - as in C - adjust appropriately).
Let p(i) be the probability of getting i, i=1,...,k
Let a(i) = [256*p(i)] (where [] is the integer part of its argument).
Take an integer vector of length 256, v(1),...v(256). Fill the first
a(1) values of v (v(1),...,v(a(1)) with 1, the next v(2) values
(v(a(1)+1),...,v(a(1)+a(2)) with 2, and so on, up to v(k) values of k.
There will be a few cells left over. The filled values represent
a(i)/256. of the probability of getting each i from 1 to k.
Let t = Sum(p(i) - a(i)/256.)
Then t be the total of the tiny bits of probability off the top of each
histogram bar that fall off the top of the table lookup.
Now, if you like, fill [256*r] of the cells with a code to indicate
generating from the right tail (say -2), and [256*t] of them with a code
to indicate generating from the tops (say, -1). There will still be some
cells left over (at least there will be a cell left over), and there
will be some leftover probability for the right tail and top (r' and t',
say, where r' = r-[256*r]/256 and similarly for t'). Fill the remaining
cell or cell with some code to indicate you still need to decide between
generating from the tops or right tails (say, -3).
Now generate a random integer between 1 and 256 (or 0 and 255, depending
on how your arrays are indexed, as mentioned earlier). Call it j, say.
If v(j) is greater than zero, return v(j). If it's -1, generate from the
distribution with pmf
q(i) = {256*p(i) - a(i)} by whatever method (say, the squared histogram
method). If it's -2, generate from the right tail, r(i) = p(i)/r, i=k+1,
k+2, ... (say by accept/reject or some more efficient algorithm if
there's one to hand). If v(j)=-3, then with probability r'/(r'+t')
generate from q(i), otherwise generate from r(i).
Choosing a good k is not always trivial, but if a(k)=0, k was probably
too big. Similarly, if 256*p(k+1) is reasonably large (bigger than 2,
say), then k was probably too small. Aside from that, it doesn't matter
all than much - the general aim is to get as much of your generating
handled by returning values from v as you can.
Glen
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================