Dear Bernhard
I am wondering where I should cut my data off. Here is the statistics
from XDS processing.
Maia
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION NUMBER OF REFLECTIONS COMPLET R-FACTOR R-FACTOR COMPARED
I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr
10.06 5509 304 364 83.5% 3.0% 4.4% 5509 63.83 3.1% 1.0% 11% 0.652 173
7.12 11785 595 595 100.0% 3.5% 4.8% 11785 59.14 3.6% 1.4% -10% 0.696 414
5.81 15168 736 736 100.0% 5.0% 5.6% 15168 51.88 5.1% 1.8% -9% 0.692 561
5.03 17803 854 854 100.0% 5.5% 5.7% 17803 50.02 5.6% 2.2% -10% 0.738 675
4.50 20258 964 964 100.0% 5.1% 5.4% 20258 52.61 5.3% 2.1% -16% 0.710 782
4.11 22333 1054 1054 100.0% 5.6% 5.7% 22333 50.89 5.8% 2.0% -16% 0.705 878
3.80 23312 1137 1137 100.0% 7.0% 6.6% 23312 42.95 7.1% 3.0% -13% 0.770 952
3.56 25374 1207 1208 99.9% 7.6% 7.3% 25374 40.56 7.8% 3.4% -18% 0.739 1033
3.35 27033 1291 1293 99.8% 9.7% 9.2% 27033 33.73 10.0% 4.1% -12% 0.765 1107
3.18 29488 1353 1353 100.0% 11.6% 11.6% 29488 28.16 11.9% 4.4% -7% 0.750
1176
3.03 31054 1419 1419 100.0% 15.7% 15.9% 31054 21.77 16.0% 6.9% -9% 0.741
1243
2.90 32288 1478 1478 100.0% 21.1% 21.6% 32288 16.99 21.6% 9.2% -6% 0.745
1296
2.79 33807 1542 1542 100.0% 28.1% 28.8% 33807 13.07 28.8% 12.9% -2%
0.783 1361
2.69 34983 1604 1604 100.0% 37.4% 38.7% 34983 9.95 38.3% 17.2% -2% 0.743
1422
2.60 35163 1653 1653 100.0% 48.8% 48.0% 35163 8.03 50.0% 21.9% -6% 0.754
1475
2.52 36690 1699 1699 100.0% 54.0% 56.0% 36690 6.98 55.3% 25.9% 0% 0.745 1517
2.44 37751 1757 1757 100.0% 67.9% 70.4% 37751 5.61 69.5% 32.5% -5% 0.733
1577
2.37 38484 1798 1799 99.9% 82.2% 84.5% 38484 4.72 84.2% 36.5% 2% 0.753 1620
2.31 39098 1842 1842 100.0% 91.4% 94.3% 39098 4.19 93.7% 43.7% -3% 0.744
1661
2.25 38809 1873 1923 97.4% 143.4% 139.3% 38809 2.84 147.1% 69.8% -2%
0.693 1696
total 556190 26160 26274 99.6% 11.9% 12.2% 556190 21.71 12.2% 9.7% -5%
0.739 22619
Bernhard Rupp (Hofkristallrat a.D.) wrote:
I think this suppression of high resolution shells via <I/sigI> cutoffs is
partially attributable to a conceptual misunderstanding of what these (darn)
R-values mean in refinement versus data merging.
In refinement, even a random atom structure follows the Wilson distribution,
and therefore, even a completely wrong non-centrosymmetric structure will
not - given proper scaling - give an Rf of more than 59%.
There is no such limit for the basic linear merging R. However, there is a
simple relation between <I/sigI> and R-merge (provided no other indecency
has been done to the data). It simply is (BMC) Rm=0.8/<I/sigI>. I.e. for
I/sigI -0.8 you get 100%, for 2 we obtain 40%, which, interpreted as Rf
would be dreadful, but for <I/sigI> 3, we get Rm=0.27, and that looks
acceptable for an Rf (or uninformed reviewer).
Btw, I also wish to point out that the I/sig cutoffs are not exactly the
cutoff criterion for anomalous phasing, a more direct measure is a signal
cutoff such as <delF/sig(delF)>; George I believe uses 1.3 for SAD.
Interestingly, in almost all structures I played with, <delF/sig(delF)> for
both, noise in anomalous data or no anomalous scatterer present, the
anomalous signal was 0.8. I haven’t figured out yet or proved the statistics
and whether this is generally true or just numerology...
And, the usual biased rant - irrespective of Hamilton tests, nobody really
needs these popular unweighted linear residuals which shall not be named,
particularly on F. They only cause trouble.
Best regards, BR
-----------------------------------------------------------------
Bernhard Rupp
001 (925) 209-7429
+43 (676) 571-0536
[email protected]
[email protected]
http://www.ruppweb.org/
-----------------------------------------------------------------
Structural Biology is the practice of
crystallography without a license.
-----------------------------------------------------------------
-----Original Message-----
From: CCP4 bulletin board [mailto:[email protected]] On Behalf Of Bart
Hazes
Sent: Thursday, March 03, 2011 7:08 AM
To: [email protected]
Subject: Re: [ccp4bb] I/sigmaI of >3.0 rule
There seems to be an epidemic of papers with I/Sigma > 3 (sometime much
larger). In fact such cases have become so frequent that I fear some people
start to believe that this is the proper procedure. I don't know where that
has come from as the I/Sigma ~ 2 criterion has been established long ago and
many consider that even a tad conservative. It simply pains me to see people
going to the most advanced synchrotrons to boost their highest resolution
data and then simply throw away much of it.
I don't know what has caused this wave of high I/Sigma threshold use but
here are some ideas
- High I/Sigma cutoffs are normal for (S/M)AD data sets where a more strict
focus on data quality is needed.
Perhaps some people have started to think this is the norm.
- For some dataset Rsym goes up strongly while I/SigI is still reasonable. I
personally believe this is due to radiation damage which affects Rsym (which
compares reflections taken after different amounts of exposure) much more
than I/SigI which is based on individual reflections. A good test would be
to see if processing only the first half of the dataset improves Rsym (or
better Rrim)
- Most detectors are square and if the detector is too far from the crystal
then the highest resolution data falls beyond the edges of the detector. In
this case one could, and should, still process data into the corners of the
detector. Data completeness at higher resolution may suffer but each
additional reflection still represents an extra restraint in refinement and
a Fourier term in the map. Due to crystal symmetry the effect on
completeness may even be less than expected.
Bart
On 11-03-03 04:29 AM, Roberto Battistutta wrote:
Dear all,
I got a reviewer comment that indicate the "need to refine the structures
at an appropriate resolution (I/sigmaI of>3.0), and re-submit the revised
coordinate files to the PDB for validation.". In the manuscript I present
some crystal structures determined by molecular replacement using the same
protein in a different space group as search model. Does anyone know the
origin or the theoretical basis of this "I/sigmaI>3.0" rule for an
appropriate resolution?
Thanks,
Bye,
Roberto.
Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
[email protected]
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129
Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it