On Nov 21, 2011, at 3:52 PM, Filip Van Petegem wrote:

> As mentioned for X-ray structures, a Luzzati analysis may give information 
> about the positional errors, but there should be an increased resolution when 
> comparing domain movements, because it's unlikely for all atoms to have an 
> error in the same direction.


Here's how I think about it:

If you use the empirical coordinate error that I described previously, you can 
use simple statistics to calculate how likely you are to get a coordinated 
movement (relative to a fixed landmark).

I can use a 1-d case as an example. In this 1-d case, let's pretend that we 
have a domain of N=25 atoms where atom 2 is about 1 away from atom 1 and atom 3 
is 2 away from atom 1 and one away from atom 2, etc, with a standard deviation 
of 1 for the position of the atoms. If atom 1 for domain A is at 1, this is just

A_j = j

Then you can have domain B that has moved +1 compared to domain A:

B_j = j+1

Since we have an alignment (B_j -> A_j), then we can calculate the movement, X:

X = mean(B) - mean(A)

We can also calculate the error of the ensemble (aka the error of the mean):

sigmaE = std( (B - mean(B)) - (A - mean(B)) ) / sqrt(25)

Then, we can calculate how likely it is we observe the movement X by tail 
integration of the cumulative normal distribution. We will justify this for the 
3-d case because the least squares superposition (from which we estimate the 
coordinate error) assumes normality.

Here is a simulation of this scenario in python:

py> import numpy
py> from scipy.special import ndtr
py> a = numpy.array([numpy.random.normal(j) for j in xrange(25)])
py> b = numpy.array([numpy.random.normal(j+1) for j in xrange(25)])
py> a
array([  1.38125295,  -0.27126096,   1.7597104 ,   1.36242299,
         3.88327659,   4.33063307,   5.00544708,   7.02888858,
         7.83945228,   9.72101719,  10.36231633,  10.29176378,
        11.78497375,  12.16082056,  14.31057296,  13.25941344,
        17.93779336,  18.05626047,  18.62148347,  20.52756478,
        19.73362283,  21.83953268,  22.28038617,  23.24545481,  22.96192518])
py> b
array([  3.32750181,   2.42664791,   3.23309368,   4.32882699,
         6.59985764,   6.49597664,   5.27921723,   7.8573831 ,
         9.98722475,  10.65225383,  11.69970159,  11.67435798,
        12.16191254,  13.69297801,  14.21845382,  17.21423427,
        16.89347161,  17.68778305,  17.89371115,  18.7679351 ,
        20.84842496,  20.69249899,  23.97436807,  23.54011453,  26.84986504])
py> X = b.mean() - a.mean()
py> sigma_ensemble = ((b - b.mean()) - (a - a.mean())).std() / math.sqrt(25)
py> X_standardized = (X - 0) / sigma_ensemble
py> 2 * ndtr(-abs(X_standardized))
    0.00011596192653578624

This means, for the 1-d scenario I describe, (using the random arrays generated 
above), the movement is expected about once for every 10,000 "experiments", 
providing a p-value, or estimate of significance. Note that the 2 comes from 
the fact that the cumulative distribution has 2 tails.

A 3-D calculation using the rmsd as the coordinate error would be similar 
except that you use Euclid's formula to calculate the distances in higher 
dimensions (instead of the absolute value of a simple subtraction as in 1-d).

James


Reply via email to