Dear list,
this is a summary of the replies I got to my question,
I want to generate a set of points in a plane, {(xi,yi)}, such that the
points are distributed according to a chosen probability
distribution density f, i.e., the infinitesimal rectangle [x,x+dx] X
[y,y+dy] shall contain f(x,y)*dx*dy points.
How do I do this ??
In other words, I am looking for a generalization of the 1-dim method,
where this is done with F*(z), where F=cumul(f) and F* = inverse of F, and
z = random out of [0,1] (or any appropriate interval if f is not
normalized).
I tried to do it the same way, just using complex numbers, but this does
not seem to work, and could also not be generalized for n>2 dimensions.
In statistical terms, I was looking for a way how to sample from a given
pdf.
Ted Harding (U.K.) and Ghandi Pawitan (Bandung, Indonesia) sent replies.
Thanks Ted and Ghandi !
- Ted Harding wrote:
The problem with "doing it the same way" is that "the same way" only
applies in 1 dimension (as a consequence of the result that F(X) has a
uniform distribution, so that F*(X) has the distribution of X). In more
than 1 dimension, you still only have the result that F(X,Y) has a uniform
distribution, and this is only 1 equation. There is no _straightforward_
way of basing it on two equations
(F1(X,Y), F2(X,Y)) = (Z1,Z1) where (Z1,Z2) is uniform on [0,1]x[0,1]
However, there are possible approaches. One is to use conditional
distributions: let F(X) be the marginal distribution for X, and let G(Y;x)
be the conditional distribution of Y given that X=x. Then both U=F(X) and,
for each x, V=G(Y;x) have uniform distributions, so provided you can
invert these as F*(U) and G*(V;x) you can then sample U=u,V=v and obtain
x=F*(u) and then y=G*(v;x).
The practical issues here depend a lot on the form of F(X,Y), since both
obtaining the conditional distribution and inverting
the functions F(X), G(Y;x) may be difficult in practice.
If it is straightforward to obtain the conditional distributions for Y
given X and X given Y, then Gibbs Sampling can enable you to sample from
(X,Y) without inverting the functions.
If you can't obtain the conditional distributions then it may be
impossible to follow such approaches, and then you may need to fall back
on simulating a random process with a known mechanism which has the
property of yielding (X,Y) distributed as f(x,y) -- if you know of such a
process!
If you could describe the density f(x,y) you are trying to sample from to
the list, then perhaps someone can suggest an approach.
- My reply:
I was also thinking on something like the conditional distribution and the
simulation approach, but this is not feasable in this case. What I have is
a potential field, V(x) = sum over i (a/r^m + b*r^n), with r=|x-xi|, x
and xi = 2-dim, and I want to assign a probability field to it, like p(x)
= 1/V(x). With many xi, this is beyond any easy tractability, I think.
- Ted Harding:
Yes, not tractable at all in the analytic sense.
However, this may be well suited to the Metropolis-Hastings algorithm
which is the basis of Markov Chain Monte Carlo.
Basically, suppose you want to sample from a density p(x) which is
proportional to a function g(x) (in your case g(x) = 1/V(x)).
Choose a "proposal distribution" Q(x) (in your case if sampling over a
rectangle it may be OK to take Q uniform, constant density, over the
rectangle).
For each point you intend to sample from p(x), choose a "starting value"
x(0) (it could be the same x(0) each time). Then:
Given any "current point" x(n), choose x(n+1) as follows:
x is sampled from Q(x) (in general from a distribution Q(x|x(n)) but this
may not be needed here)
With probability pA = min[1, (g(x)/g(x(n))*(Q(x(n))/Q(x)) ] in general
min[1, (g(x)/g(x(n)))*(Q(x(n)|x)/Q(x|x(n)) ])
adopt the value x as x(n+1); otherwise (with probability 1 - pA) stay with
x(n+1) = x(n).
Repeat the above many times. This generates a Markov Chain
x(0), x(1), ... , x(n), ...
whose equilibrium probability density is proportional to g(x), i.e. equal
to p(x).
Stop at a suitable point (basically, when n is large enough for the
distribution of X(n) to be sufficiently close to the
equilbrium). The main issue to be settled (which may require emprical
investigation) is: When is n "large enough"? The MCMC literature has a lot
of studies of "convergence diagnostics".
When you have stopped, at x(n) say, this result is a point sampled from
p(x) which you can then place in your region. Repeat all the above until
you have your desired sample.
- comment:
This was the hint I needed. The method works well, although probably some
caution is needed as convergence of the Markov Chain is concerned.
A google search with keyword "Metropolis Hastings algorithm" gave me a lot
of useful additional information.
- Ghandi Pawitan wrote:
you could try to look at Arbia's book, that is Spatial data configuration
in Statistical analysis of regional economics and related problem (1989)
published by Kluwer.
- comment:
This book is quite technical and not easy to read for a humble physicist
like me. It has however a short section on the problem. Furthermore it
gives extensive references. E.g. J.M. Chambers, Computational methods for
data analysis, John Wiley & Sons New York 1977, seems to be another good
source.
=================================================================
Peter Bossew
Institute of Physics and Biophysics,
University of Salzburg
home: A-1090 Vienna, Austria, Georg Sigl-Gasse 13/11, ph: +43-1-3177627
[EMAIL PROTECTED]
[EMAIL PROTECTED]
--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe
ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org