To add a little to James's excellent summary.
As reviewers I think we should always question results where the I/SigI
is > 2-3 in the outer shell. Authors should at least be asked to justify
why they have not cellected the best available experimental data.
Ditto if Rfree is too low for the resolution ( eg differing by < 5% at
2.8A) the authors should be challenged - there are many ways of
underestimated your Rfree - all of which compromise the maximum
likelihood refinement, but they should be deprecated!
To finish with a question that always puzzles me - why do structures
which generate very similar quality maps at similar resolutions have
such different Rfactor profiles. I have seen lovely final maps at 2A
with R< 18% etc, and also lovely final maps at 2A with Rfactors ~
24%... It might be a radiation damage phenonoma I guess.
Eleanor
James Holton wrote:
*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
Well, since I was mentioned by name. I suppose I should put my two
cents in:
Rmerge is NOT a good way to judge your last resolution shell!
My advice if you are faced with a reviewer who complains your Rmerge
is to high is to change the name to Rsym. This is actually the
appropriate name for the statistic you are quoting. Rmerge
(traditionally) refers to the R factor of combining data from two
crystals. Rsym refers to the agreement between symmetry mates after
scaling.
Rsym (and Rmerge) used to be useful things to quote back when people
applied a 3-sigma cutoff to their raw observation data. Seems like a
borderline criminal thing to do nowadays (and it is), but in the dark
ages before maximum likelihood the only way to keep a least-squares
refinement package from chasing noise was to make sure you didn't
confuse it with a ton of weak (noisy) data.
All "R" statistics are supposed to be measuring one type of error (R
is for residual). Rmerge is supposed to measure non-isomorphism.
Rsym is supposed to measure deviation from true symmetry. Rcryst and
Rfree measure the "incorrectness" of your model.
The absolute value of "R" statistics is only meaningful if you can
normalize out the contribution of other sources of error. Weak data
have more random noise than strong data, and the more high-resolution
data you include, the more weak data you will have. Applying a
3-sigma cutoff eliminates any spots measured with more than ~33% error
(if you believe your sigmas). The remaining strong spots have
relatively little random error (from counting statistics), so the
3-sigma cutoff tends to "normalize" data collected from one crystal or
another. However, if you apply a 3-sigma cutoff, you will have less
and less spots as you get out to high resolution. This is why
"completeness" became a criterion for the high-resolution limit.
Anyway, in sumary: I say don't worry about your Rmerge in the high
resolution shell. I/sd is much more meaningful. Just be careful to
optimize your error model (SDCORR in scala, error_scale_factor and
estimated_error in scalepack) so that your scatter/sigma values in the
scala log are close to one (or the final "Chi^2" in scalepack). As
for what I/sd you should cut off your data? I use I/sd of 1.5.
Mainly because it is a "compromise" between 1.0 (signal = noise) and
2.0 (signal = 2x noise).
As a comment: I fear that the recent rash of structures with I/sd of 6
or 8 in the outer resolution shell is happening because Rfree is also
subject to the unfortunate feature of "R" statistics mentioned above:
you get a lower Rcryst and Rfree if you are willing to sacrifice a
little "resolution". I guess it is just too tempting to play with
your resolution limit when you run out of model building ideas and
your Rfree is still too high. This is a BAD BAD thing to do. BAD!!
Better to calculate an Rfree using only data with F/sd > 3 (note it as
such!), and have the decency to deposit all your structure factors.
-James Holton
MAD Scientist
Bart Hazes wrote:
Hi Ashima,
With these statistics you shouldn't have to worry about reviewers, it
looks perfectly sensible. Actually I'm much more concerned about the
recent epidemic of overly pessimistic resolution cutoffs. In our
journal club at least half the papers have I/SigI in the highest
resolution bin in the 3-6 range which means they could have gotten
significantly higher resolution. There are situations where data
quality is more important than resolution, for instance (anomalous)
phasing, but I see the same with many native data sets.
It is not clear to me if people are placing the detector too far from
the crystal and thus not even measure the highest resolution data or
that they just elect not to process those data. Why??? To get nicer
looking statistics???? That would be VERY bad practice!!!
A kinder view is that the detector distance is set based on the
apparent resolution of the first image(s) which underestimates the
true resolution of a high redundancy data set. If you don't need a
long detector distance to resolve spots I prefer to select a distance
where my visible diffraction uses the central 80-90% of the detector
allowing mosflm to try to extract some sensible information from
beyond what the eye can see.
This looks like something James Holton may have looked at. If so I'd
be interested to hear if he or the elves have come up with a magic rule.
Bart
Ashima Bagaria wrote:
*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
HI all,
In regards to my CCP4 question about the acceptable Rmerge values in
last resolution shell..various other parameters pertaining to the
protein data at 3.5 A are
I/sigmaI = 13.1 (2.3)
%completeness = 95.7(96.8)
multiplicity = 3.8
All suggestions are welcome
Regards
ashima
begin:vcard
fn:Eleanor Dodson
n:Dodson;Eleanor
email;internet:[EMAIL PROTECTED]
tel;work:+44 (0) 1904 328259
tel;fax:+44 (0) 1904 328266
tel;home:+44 (0) 1904 424449
version:2.1
end:vcard