On Nov 21, 2011, at 3:52 PM, Filip Van Petegem wrote:
> As mentioned for X-ray structures, a Luzzati analysis may give information
> about the positional errors, but there should be an increased resolution when
> comparing domain movements, because it's unlikely for all atoms to have an
> error in the same direction.
Here's how I think about it:
If you use the empirical coordinate error that I described previously, you can
use simple statistics to calculate how likely you are to get a coordinated
movement (relative to a fixed landmark).
I can use a 1-d case as an example. In this 1-d case, let's pretend that we
have a domain of N=25 atoms where atom 2 is about 1 away from atom 1 and atom 3
is 2 away from atom 1 and one away from atom 2, etc, with a standard deviation
of 1 for the position of the atoms. If atom 1 for domain A is at 1, this is just
A_j = j
Then you can have domain B that has moved +1 compared to domain A:
B_j = j+1
Since we have an alignment (B_j -> A_j), then we can calculate the movement, X:
X = mean(B) - mean(A)
We can also calculate the error of the ensemble (aka the error of the mean):
sigmaE = std( (B - mean(B)) - (A - mean(B)) ) / sqrt(25)
Then, we can calculate how likely it is we observe the movement X by tail
integration of the cumulative normal distribution. We will justify this for the
3-d case because the least squares superposition (from which we estimate the
coordinate error) assumes normality.
Here is a simulation of this scenario in python:
py> import numpy
py> from scipy.special import ndtr
py> a = numpy.array([numpy.random.normal(j) for j in xrange(25)])
py> b = numpy.array([numpy.random.normal(j+1) for j in xrange(25)])
py> a
array([ 1.38125295, -0.27126096, 1.7597104 , 1.36242299,
3.88327659, 4.33063307, 5.00544708, 7.02888858,
7.83945228, 9.72101719, 10.36231633, 10.29176378,
11.78497375, 12.16082056, 14.31057296, 13.25941344,
17.93779336, 18.05626047, 18.62148347, 20.52756478,
19.73362283, 21.83953268, 22.28038617, 23.24545481, 22.96192518])
py> b
array([ 3.32750181, 2.42664791, 3.23309368, 4.32882699,
6.59985764, 6.49597664, 5.27921723, 7.8573831 ,
9.98722475, 10.65225383, 11.69970159, 11.67435798,
12.16191254, 13.69297801, 14.21845382, 17.21423427,
16.89347161, 17.68778305, 17.89371115, 18.7679351 ,
20.84842496, 20.69249899, 23.97436807, 23.54011453, 26.84986504])
py> X = b.mean() - a.mean()
py> sigma_ensemble = ((b - b.mean()) - (a - a.mean())).std() / math.sqrt(25)
py> X_standardized = (X - 0) / sigma_ensemble
py> 2 * ndtr(-abs(X_standardized))
0.00011596192653578624
This means, for the 1-d scenario I describe, (using the random arrays generated
above), the movement is expected about once for every 10,000 "experiments",
providing a p-value, or estimate of significance. Note that the 2 comes from
the fact that the cumulative distribution has 2 tails.
A 3-D calculation using the rmsd as the coordinate error would be similar
except that you use Euclid's formula to calculate the distances in higher
dimensions (instead of the absolute value of a simple subtraction as in 1-d).
James