3.0 rule

James Holton Sun, 06 Mar 2011 12:34:41 -0800

Yes, I would classify anything with I/sigmaI < 3 as "weak". And yes, ofcourse it is possible to get "weak" spots from small molecule crystals.After all, there is no spot so "strong" that it cannot be defeated by asufficient amount of background! I just meant that, relativelyspeaking, the intensities diffracted from a small molecule crystal areorders of magnitude brighter than those from a macromolecular crystal ofthe same size, and even the same quality (the 1/Vcell^2 term in Darwin'sformula).

I find it interesting that you point out the use of a 2 sigma(I)intensity cutoff for small molecule data sets! Is this still commonpractice? I am not a card-carrying "small molecule crystallographer",so I'm not sure. However, if that is the case, then by definition thereare no "weak" intensities in the data set. And this is exactly the kindof data you want for least-squares refinement targets and computing "%error" quality metrics like R factors. For likelihood targets, however,the "weak" data are actually a powerful restraint.


-James Holton
MAD Scientist

On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote:

Could you please expand on your statement that "small-molecule datahas essentially no weak spots."? The small molecule data sets I'veworked with have had large numbers of "unobserved" reflections where Iused 2 sigma(I) cutoffs (maybe 15-30% of the reflections). Would youconsider those "weak" spots or not? Ron
On Sun, 6 Mar 2011, James Holton wrote:
I should probably admit that I might be indirectly responsible forthe resurgence of this I/sigma > 3 idea, but I never intended this inthe way described by the original poster's reviewer!
What I have been trying to encourage people to do is calculate Rfactors using only hkls for which the signal-to-noise ratio is > 3.Not refinement! Refinement should be done against all data. I merelypropose that weak data be excluded from R-factor calculations afterthe refinement/scaling/mergeing/etc. is done.
This is because R factors are a metric of the FRACTIONAL error insomething (aka a "% difference"), but a "% error" is only meaningfulwhen the thing being measured is not zero. However, inmacromolecular crystallography, we tend to measure a lot of"zeroes". There is nothing wrong with measuring zero! An excellentexample of this is confirming that a systematic absence is in fact"absent". The "sigma" on the intensity assigned to an absent spot isstill a useful quantity, because it reflects how confident you are inthe measurement. I.E. a sigma of "10" vs "100" means you are moresure that the intensity is zero. However, there is no "R factor" forsystematic absences. How could there be! This is because thedefinition of "% error" starts to break down as the "true" spotintensity gets weaker, and it becomes completely meaningless when the"true" intensity reaches zero.
Historically, I believe the widespread use of R factors came aboutbecause small-molecule data has essentially no weak spots. With theexception of absences (which are not used in refinement), spots from"salt crystals" are strong all the way out to edge of the detector,(even out to the "limiting sphere", which is defined by the x-raywavelength). So, when all the data are strong, a "% error" is aneasy-to-calculate quantity that actually describes the "sigma"s ofthe data very well. That is, sigma(I) of strong spots tends to bedominated by things like beam flicker, spindle stability, shutteraccuracy, etc. All these usually add up to ~5% error, and indeedeven the Braggs could typically get +/-5% for the intensity of thediffracted rays they were measuring. Things like Rsym were thereforecreated to check that nothing "funny" happened in the measurement.
For similar reasons, the quality of a model refined againstall-strong data is described very well by a "% error", and this iswhy the refinement R factors rapidly became popular. Most peopleintuitively know what you mean if you say that your model fits thedata to "within 5%". In fact, a widely used criterion for thecorrectness of a "small molecule" structure is that the refinement Rfactor must be LOWER than Rsym. This is equivalent to saying thatyour curve (model) fit your data "to within experimental error".Unfortunately, this has never been the case for macromolecularstructures!
The problem with protein crystals, of course, is that we have lots of"weak" data. And by "weak", I don't mean "bad"! Yes, it is alwaysnicer to have more intense spots, but there is nothing shameful aboutknowing that certain intensities are actually very close to zero. Infact, from the point of view of the refinement program, isn'tdescribing some high-angle spot as: "zero, plus or minus 10", betterthan "I have no idea"? Indeed, several works mentioned already aswell as the "free lunch algorithm" have demonstrated that these"zero" data can actually be useful, even if it is well beyond the"resolution limit".
So, what do we do? I see no reason to abandon R factors, since theyhave such a long history and give us continuity of criteria goingback almost a century. However, I also see no reason to punishourselves by including lots of zeroes in the denominator. In fact,using weak data in an R factor calculation defeats their bestfeature. R factors are a very good estimate of the fractionalcomponent of the total error, provided they are calculated withstrong data only.
Of course, with strong and weak data, the best thing to do is comparethe model-data disagreement with the magnitude of the error. Thatis, compare |Fobs-Fcalc| to sigma(Fobs), not Fobs itself. Modernrefinement programs do this! And I say the more data the merrier.
-James Holton
MAD Scientist


On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote:
hi
Recently on a paper I submitted, it was the editor of the journalwho wanted exactly the same thing. I never argued with the editorabout this (should have maybe), but it could be one cause of theepidemic that Bart Hazes saw....
best regards

Marjolein

On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote:
Dear all,
I got a reviewer comment that indicate the "need to refine thestructures at an appropriate resolution (I/sigmaI of>3.0), andre-submit the revised coordinate files to the PDB for validation.".In the manuscript I present some crystal structures determined bymolecular replacement using the same protein in a different spacegroup as search model. Does anyone know the origin or thetheoretical basis of this "I/sigmaI>3.0" rule for an appropriateresolution?
Thanks,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine)
via Orus 2, 35129 Padova - ITALY
tel. +39.049.7923236
fax +39.049.7923250
www.vimm.it

Re: [ccp4bb] I/sigmaI of >3.0 rule

Reply via email to