To add a little to James's excellent summary.
As reviewers I think we should always question results where the I/SigI is > 2-3 in the outer shell. Authors should at least be asked to justify why they have not cellected the best available experimental data. Ditto if Rfree is too low for the resolution ( eg differing by < 5% at 2.8A) the authors should be challenged - there are many ways of underestimated your Rfree - all of which compromise the maximum likelihood refinement, but they should be deprecated!

To finish with a question that always puzzles me - why do structures which generate very similar quality maps at similar resolutions have such different Rfactor profiles. I have seen lovely final maps at 2A with R< 18% etc, and also lovely final maps at 2A with Rfactors ~ 24%... It might be a radiation damage phenonoma I guess.
  Eleanor


James Holton wrote:

***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***



Well, since I was mentioned by name. I suppose I should put my two cents in:

Rmerge is NOT a good way to judge your last resolution shell!

My advice if you are faced with a reviewer who complains your Rmerge is to high is to change the name to Rsym. This is actually the appropriate name for the statistic you are quoting. Rmerge (traditionally) refers to the R factor of combining data from two crystals. Rsym refers to the agreement between symmetry mates after scaling.

Rsym (and Rmerge) used to be useful things to quote back when people applied a 3-sigma cutoff to their raw observation data. Seems like a borderline criminal thing to do nowadays (and it is), but in the dark ages before maximum likelihood the only way to keep a least-squares refinement package from chasing noise was to make sure you didn't confuse it with a ton of weak (noisy) data. All "R" statistics are supposed to be measuring one type of error (R is for residual). Rmerge is supposed to measure non-isomorphism. Rsym is supposed to measure deviation from true symmetry. Rcryst and Rfree measure the "incorrectness" of your model. The absolute value of "R" statistics is only meaningful if you can normalize out the contribution of other sources of error. Weak data have more random noise than strong data, and the more high-resolution data you include, the more weak data you will have. Applying a 3-sigma cutoff eliminates any spots measured with more than ~33% error (if you believe your sigmas). The remaining strong spots have relatively little random error (from counting statistics), so the 3-sigma cutoff tends to "normalize" data collected from one crystal or another. However, if you apply a 3-sigma cutoff, you will have less and less spots as you get out to high resolution. This is why "completeness" became a criterion for the high-resolution limit.

Anyway, in sumary: I say don't worry about your Rmerge in the high resolution shell. I/sd is much more meaningful. Just be careful to optimize your error model (SDCORR in scala, error_scale_factor and estimated_error in scalepack) so that your scatter/sigma values in the scala log are close to one (or the final "Chi^2" in scalepack). As for what I/sd you should cut off your data? I use I/sd of 1.5. Mainly because it is a "compromise" between 1.0 (signal = noise) and 2.0 (signal = 2x noise).

As a comment: I fear that the recent rash of structures with I/sd of 6 or 8 in the outer resolution shell is happening because Rfree is also subject to the unfortunate feature of "R" statistics mentioned above: you get a lower Rcryst and Rfree if you are willing to sacrifice a little "resolution". I guess it is just too tempting to play with your resolution limit when you run out of model building ideas and your Rfree is still too high. This is a BAD BAD thing to do. BAD!! Better to calculate an Rfree using only data with F/sd > 3 (note it as such!), and have the decency to deposit all your structure factors.
-James Holton
MAD Scientist


Bart Hazes wrote:

Hi Ashima,

With these statistics you shouldn't have to worry about reviewers, it looks perfectly sensible. Actually I'm much more concerned about the recent epidemic of overly pessimistic resolution cutoffs. In our journal club at least half the papers have I/SigI in the highest resolution bin in the 3-6 range which means they could have gotten significantly higher resolution. There are situations where data quality is more important than resolution, for instance (anomalous) phasing, but I see the same with many native data sets.

It is not clear to me if people are placing the detector too far from the crystal and thus not even measure the highest resolution data or that they just elect not to process those data. Why??? To get nicer looking statistics???? That would be VERY bad practice!!!

A kinder view is that the detector distance is set based on the apparent resolution of the first image(s) which underestimates the true resolution of a high redundancy data set. If you don't need a long detector distance to resolve spots I prefer to select a distance where my visible diffraction uses the central 80-90% of the detector allowing mosflm to try to extract some sensible information from beyond what the eye can see.

This looks like something James Holton may have looked at. If so I'd be interested to hear if he or the elves have come up with a magic rule.

Bart

Ashima Bagaria wrote:

***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***



HI all,
In regards to my CCP4 question about the acceptable Rmerge values in last resolution shell..various other parameters pertaining to the protein data at 3.5 A are

I/sigmaI = 13.1 (2.3)
%completeness = 95.7(96.8)
multiplicity = 3.8

All suggestions are welcome

Regards
ashima







begin:vcard
fn:Eleanor  Dodson
n:Dodson;Eleanor 
email;internet:[EMAIL PROTECTED]
tel;work:+44 (0) 1904 328259
tel;fax:+44 (0) 1904 328266
tel;home:+44 (0) 1904 424449
version:2.1
end:vcard

Reply via email to