Gibbs sampling

From Wikipedia, the free encyclopedia

Jump to:navigation, search 
In mathematics and physics, Gibbs sampling or Gibbs sampler is an algorithm to 
generate a sequence of samples from the joint probability distribution of two 
or more random variables. The purpose of such a sequence is to approximate the 
joint distribution, or to compute an integral (such as an expected value). 
Gibbs sampling is a special case of the Metropolis–Hastings algorithm, and thus 
an example of a Markov chain Monte Carlo algorithm. The algorithm is named 
after the physicist J. W. Gibbs, in reference to an analogy between the 
sampling algorithm and statistical physics. The algorithm was devised by 
brothers Stuart and Donald Geman, some eight decades after the passing of 
Gibbs.[1]
Gibbs sampling is applicable when the joint distribution is not known 
explicitly, but the conditional distribution of each variable is known. The 
Gibbs sampling algorithm generates an instance from the distribution of each 
variable in turn, conditional on the current values of the other variables. It 
can be shown (see, for example, Gelman et al. 1995) that the sequence of 
samples constitutes a Markov chain, and the stationary distribution of that 
Markov chain is just the sought-after joint distribution.
Gibbs sampling is particularly well-adapted to sampling the posterior 
distribution of a Bayesian network, since Bayesian networks are typically 
specified as a collection of conditional distributions.





Contents[hide]

1 Background 
2 Implementation 
3 Failure modes 
4 Software 
5 Notes 
6 References 
7 External links 

//


[edit] Background
Gibbs sampling is a special case of Metropolis–Hastings algorithm. The point of 
Gibbs sampling is that given a multivariate distribution it is simpler to 
sample from a conditional distribution than to marginalize by integrating over 
a joint distribution. Suppose we want to obtain  samples of  from a joint 
distribution . We begin with a value of  and sample  by . Once that value of  
is calculated, repeat by sampling for the next : .
[edit] Implementation
Suppose that a sample  is taken from a distribution depending on a parameter 
vector  of length , with prior distribution . It may be that  is very large and 
that numerical integration to find the marginal densities of the  would be 
computationally expensive. Then an alternative method of calculating the 
marginal densities is to create a Markov chain on the space  by repeating these 
two steps:

Pick a random index  
Pick a new value for  according to  
These steps define a reversible Markov chain with the desired invariant 
distribution . This can be proved as follows. Define  if  for all  and let  
denote the probability of a jump from  to . Then, the transition probabilities 
are

 
So

 
since  is an equivalence relation. Thus the detailed balance equations are 
satisfied, implying the chain is reversible and it has invariant distribution .
In practice, the suffix  is not chosen at random, and the chain cycles through 
the suffixes in order. In general this gives a non-reversible chain, but it 
will still have the desired invariant distribution (as long as the chain can 
access all states under the fixed ordering).
[edit] Failure modes
There are two ways that Gibbs sampling can fail. The first is when there are 
islands of high-probability states, with no paths between them. For example, 
consider a probability distribution over 2-bit vectors, where the vectors (0,0) 
and (1,1) each have probability ½, but the other two vectors (0,1) and (1,0) 
have probability zero. Gibbs sampling will become trapped in one of the two 
high-probability vectors, and will never reach the other one. More generally, 
for any distribution over high-dimensional, real-valued vectors, if two 
particular elements of the vector are perfectly correlated (or perfectly 
anti-correlated), those two elements will become stuck, and Gibbs sampling will 
never be able to change them.
The second problem can happen even when all states have nonzero probability and 
there is only a single island of high-probability states. For example, consider 
a probability distribution over 100-bit vectors, where the all-zeros vector 
occurs with probability ½, and all other vectors are equally probable, and so 
have a probability of  each. If you want to estimate the probability of the 
zero vector, it would be sufficient to take 100 or 1000 samples from the true 
distribution. That would very likely give an answer very close to ½. But you 
would probably have to take more than 2100 samples from Gibbs sampling to get 
the same result. No computer could do this in a lifetime.
This problem occurs no matter how long the burn in period is. This is because 
in the true distribution, the zero vector occurs half the time, and those 
occurrences are randomly mixed in with the nonzero vectors. Even a small sample 
will see both zero and nonzero vectors. But Gibbs sampling will alternate 
between returning only the zero vector for long periods (about 299 in a row), 
then only nonzero vectors for long periods (about 299 in a row). Thus 
convergence to the true distribution is extremely slow, requiring much more 
than 299 steps; taking this many steps is not computationally feasible in a 
reasonable time period. The slow convergence here can be seen as a consequence 
of the curse of dimensionality.
[edit] Software
The WinBUGS software (the open source version is called OpenBUGS) does a 
Bayesian analysis of complex statistical models using Markov chain Monte Carlo. 
BUGS comes from Bayesian inference using Gibbs sampling.
JAGS (Just another Gibbs sampler) is a GPL program for analysis of Bayesian 
hierarchical models using Markov Chain Monte Carlo.
[edit] Notes

 

--- On Fri, 5/14/10, Sole Acha, Xavi <[email protected]> wrote:


From: Sole Acha, Xavi <[email protected]>
Subject: Re: [BiO BB] How to create a Transcription binding site profile
To: "General Forum at Bioinformatics.Org" <[email protected]>
Date: Friday, May 14, 2010, 11:33 AM


Clustal may work for you.

http://www.clustal.org/

HTH,

Xavi.

------
Xavier Solé Acha
Unitat de Biomarcadors i Susceptibilitat
Unit of Biomarkers and Susceptibility
Institut Català d'Oncologia // Catalan Institute of Oncology
Gran Via de L'Hospitalet 199-203
08907 L'Hospitalet de Llobregat, Barcelona, Spain.
Phone: +34 93 260 71 86 / +34 93 335 90 11 (ext. 3194)
Fax: +34 93 260 71 88
E-mail: x.sole (at) iconcologia.net

-----Mensaje original-----
De: [email protected] [mailto:[email protected]] En 
nombre de Dan Bolser
Enviado el: viernes, 14 de mayo de 2010 17:30
Para: General Forum at Bioinformatics.Org
Asunto: Re: [BiO BB] How to create a Transcription binding site profile

I don't know anything specific, but you could try using 'Gibbs
sampling' algorithms to automatically discover the motif in the
sequence sets?



On 13 May 2010 21:02, Che, Anney (NIH/NCI) [E] <[email protected]> wrote:
>
> Hi everyone,
>
> I have questions regarding on how to create a transcription binding sites 
> profile.
>
> Since I have the sequences of the areas that bind to the gene from Chip-ChIp 
> , I can align all the sequences then with the alignment that created to 
> generate a profile.
>
> Since this is high-throughput, does any one know any tool that can align 
> short sequences programmatically and then output an alignment file?
>
> Also a program that generates an alignment profile from an alignment.
>
>
> Thanks,
>
> Anney
>
> _______________________________________________
> BBB mailing list
> [email protected]
> http://www.bioinformatics.org/mailman/listinfo/bbb
>

_______________________________________________
BBB mailing list
[email protected]
http://www.bioinformatics.org/mailman/listinfo/bbb

_______________________________________________
BBB mailing list
[email protected]
http://www.bioinformatics.org/mailman/listinfo/bbb



      
_______________________________________________
BBB mailing list
[email protected]
http://www.bioinformatics.org/mailman/listinfo/bbb

Reply via email to