Date: Tue, 31 Jan 2012 09:28:49 +0000
From: CCP4 bulletin board <[email protected]> (on behalf of Randy Read
<[email protected]>)
Subject: Re: [ccp4bb] Reasoning for Rmeas or Rpim as Cutoff
To: [email protected]
Hi Frank,
Now that I've been forced to recalibrate my measure
of "old literature" (apparently it's not just
literature dating from before you started your
PhD)...
This idea has been used occasionally, but I think it
might be becoming more relevant as more structures
are done at low resolution. To be honest, my
criteria for resolution tend to be flexible -- if a
crystal diffracts to 2.2A resolution by the
I/sig(I)>2 criterion, there's less incentive to
worry about it -- pushing it to 2A probably won't
make much difference to the biological questions
that can be answered. But if the 2-sigma criterion
says that the crystal diffracts to 3.3A, then I'm
much more likely to see how much more can
justifiably be squeezed out of it. Actually, I'm
more inclined to start from a 1-sigma cutoff in the
first instance, trusting the maximum likelihood
methods to deal appropriately with the uncertainty.
It's not too hard to do the higher-resolution
cross-validation, but there are a number of things
to worry about. First, I remember being told that
data processing programs will do a better job of
learning the profiles if you only give them data to
a resolution where there are real spots, so you
probably don't always want to integrate to an
arbitrarily high resolution, unless you're willing
to go back to the integration after reassessing the
resolution cutoff. Maybe, as a standard protocol,
one could integrate to a conservative resolution and
a much higher resolution, then use the conservative
data set for initial work and the higher resolution
data set for evaluating the optimal resolution
cutoff -- and then reintegrate one more time later
at that resolution, using those data for the rest of
the structure determination.
The idea of using the SigmaA curve from Refmac has
come up, but SigmaA curves from cross-validation
data will have a problem. In order to get these to
behave (with a small number of reflections per
resolution bin), you need to smooth the curve in
some way. Refmac does this by fitting a functional
form, so the high-resolution SigmaA values are bound
to drop off smoothly regardless of the real
structure factor agreement. If you're evaluating
resolution immediately after molecular replacement
with a good model, then you could use my old SIGMAA
program to get independent SigmaA values for
individual resolution bins, using all the data
(because there's no danger of over-fitting).
However, if you start out with a poor model or
solve the structure by experimental phasing, you'll
have to do some building and refinement before you
have a model good enough to compare with the
higher-resolution data. Then you want to compare
the fit of the cross-validation data up to the
resolution cutoff used in refinement to the
resolution-dependent fit of all the higher
resolution data not used in refinement. I'd
probably do that, at the moment, by using sftools to
select all the data that haven't been used in
refinement then calculate correlation coefficients
in resolution bins (which are probably as good for
this purpose as SigmaA values). (For
non-aficionados of sftools, the selection could be
done by selecting the reflections with d-spacing
less than dmin for your refinement, selecting the
subset of those that are in the working set, then
inverting the selection to get everything not used
in refinement.)
Regards,
Randy
On 30 Jan 2012, at 10:03, Frank von Delft wrote:
Hi Randy - thank you for a very interesting
reminder to old literature.
I'm intrigued: how come this apparently excellent
idea has not become standard best practice in the
14 years since it was published?
phx
On 30/01/2012 09:40, Randy Read wrote:
Hi,
Here are a couple of links on the idea of
judging resolution by a type of cross-validation
with data not used in refinement:
Ling et al,
1998: http://pubs.acs.org/doi/full/10.1021/bi971806n
Brunger et al,
2008: http://journals.iucr.org/d/issues/2009/02/00/ba5131/index.html
(cites earlier relevant papers from Brunger's
group)
Best wishes,
Randy Read
On 30 Jan 2012, at 07:09, arka chakraborty
wrote:
Hi all,
In the context of the above going discussion
can anybody post links for a few relevant
articles?
Thanks in advance,
ARKO
On Mon, Jan 30, 2012 at 3:05 AM, Randy Read
<[email protected]> wrote:
Just one thing to add to that very detailed
response from Ian.
We've tended to use a slightly different
approach to determining a sensible
resolution cutoff, where we judge whether
there's useful information in the highest
resolution data by whether it agrees with
calculated structure factors computed from a
model that hasn't been refined against those
data. We first did this with the complex of
the Shiga-like toxin B-subunit pentamer with
the Gb3 trisaccharide (Ling et al, 1998).
From memory, the point where the average
I/sig(I) drops below 2 was around 3.3A.
However, we had a good molecular
replacement model to solve this structure
and, after just carrying out rigid-body
refinement, we computed a SigmaA plot using
data to the edge of the detector (somewhere
around 2.7A, again from memory). The SigmaA
plot dropped off smoothly to 2.8A
resolution, with values well above zero
(indicating significantly better than random
agreement), then dropped suddenly. So we
chose 2.8A as the cutoff. Because there
were four pentamers in the asymmetric unit,
we could then use 20-fold NCS averaging,
which gave a fantastic map. In this case,
the averaging certainly helped to pull out
something very useful from a very weak
signal, because the maps weren't nearly as
clear at lower resolution.
Since then, a number of other people have
applied similar tests. Notably, Axel
Brunger has done some careful analysis to
show that it can indeed be useful to take
data beyond the conventional limits.
When you don't have a great MR model, you
can do something similar by limiting the
resolution for the initial refinement and
rebuilding, then assessing whether there's
useful information at higher resolution by
using the improved model (which hasn't seen
the higher resolution data) to compute
Fcalcs. By the way, it's not necessary to
use a SigmaA plot -- the correlation between
Fo and Fc probably works just as well. Note
that, when the model has been refined
against the lower resolution data, you'll
expect a drop in correlation at the
resolution cutoff you used for refinement,
unless you only use the cross-validation
data for the resolution range used in
refinement.
-----
Randy J. Read
Department of Haematology, University of
Cambridge
Cambridge Institute for Medical Research
Tel: +44 1223 336500
Wellcome Trust/MRC Building
Fax: +44 1223 336827
Hills Road
E-mail:
[email protected]
Cambridge CB2 0XY, U.K.
www-structmed.cimr.cam.ac.uk
On 29 Jan 2012, at 17:25, Ian Tickle wrote:
> Jacob, here's my (personal) take on this:
>
> The data quality metrics that everyone
uses clearly fall into 2
> classes: 'consistency' metrics, i.e.
Rmerge/meas/pim and CC(1/2) which
> measure how well redundant observations
agree, and signal/noise ratio
> metrics, i.e. mean(I/sigma) and
completeness, which relate to the
> information content of the data.
>
> IMO the basic problem with all the
consistency metrics is that they
> are not measuring the quantity that is
relevant to refinement and
> electron density maps, namely the
information content of the data, at
> least not in a direct and meaningful way.
This is because there are 2
> contributors to any consistency metric:
the systematic errors (e.g.
> differences in illuminated volume and
absorption) and the random
> errors (from counting statistics, detector
noise etc.). If the data
> are collected with sufficient redundancy
the systematic errors should
> hopefully largely cancel, and therefore
only the random errors will
> determine the information content.
Therefore the systematic error
> component of the consistency measure
(which I suspect is the biggest
> component, at least for the strong
reflections) is not relevant to
> measuring the information content. If the
consistency measure only
> took into account the random error
component (which it can't), then it
> would be essentially be a measure of
information content, if only
> indirectly (but then why not simply use a
direct measure such as the
> signal/noise ratio?).
>
> There are clearly at least 2 distinct
problems with Rmerge, first it's
> including systematic errors in its measure
of consistency, second it's
> not invariant with respect to the
redundancy (and third it's useless
> as a statistic anyway because you can't do
any significance tests on
> it!). The redundancy problem is fixed to
some extent with Rpim etc,
> but that still leaves the other problems.
It's not clear to me that
> CC(1/2) is any better in this respect,
since (as far as I understand
> how it's implemented), one cannot be sure
that the systematic errors
> will cancel for each half-dataset Imean,
so it's still likely to
> contain a large contribution from the
irrelevant systematic error
> component and so mislead in respect of the
real data quality exactly
> in the same way that Rmerge/meas/pim do.
One may as well use the
> Rmerge between the half dataset Imeans,
since there would be no
> redundancy effect (i.e. the redundancy
would be 2 for all included
> reflections).
>
> I did some significance tests on CC(1/2)
and I got silly results, for
> example it says that the significance
level for the CC is ~ 0.1, but
> this corresponded to a huge Rmerge (200%)
and a tiny mean(I/sigma)
> (0.4). It seems that (without any basis
in statistics whatsoever) the
> rule-of-thumb CC > 0.5 is what is
generally used, but I would be
> worried that the statistics are so far
divorced from the reality - it
> suggests that something is seriously wrong
with the assumptions!
>
> Having said all that, the mean(I/sigma)
metric, which on the face of
> it is much more closely related to the
information content and
> therefore should be a more relevant metric
than Rmerge/meas/pim &
> CC(1/2), is not without its own problems
(which probably explains the
> continuing popularity of the other
metrics!). First and most obvious,
> it's a hostage to the estimate of sigma(I)
used. I've never been
> happy with inflating the counting sigmas
to include effects of
> systematic error based on the consistency
of redundant measurements,
> since as I indicated above if the data are
collected redundantly in
> such a way that the systematic errors
largely cancel, it implies that
> the systematic errors should not be
included in the estimate of sigma.
> The fact that then the sigma(I)'s would
generally be smaller (at
> least for the large I's), so the sample
variances would be much larger
> than the counting variances, is
irrelevant, because the former
> includes the systematic errors. Also the
I/sigma cut-off used would
> probably not need to be changed since it
affects only the weakest
> reflections which are largely unaffected
by the systematic error
> correction.
>
> The second problem with mean(I/sigma) is
also obvious: i.e. it's a
> mean, and as such it's rather insensitive
to the actual distribution
> of I/sigma(I). For example if a shell
contained a few highly
> significant intensities these could be
overwhelmed by a large number
> of weak data and give an insignificant
mean(I/sigma). It seems to me
> that one should be considering the
significance of individual
> reflections, not the shell averages. Also
the average will depend on
> the width of the resolution bin, so one
will get the strange effect
> that the apparent resolution will depend
on how one bins at the data!
> The assumption being made in taking the
bin average is that I/sigma(I)
> falls off smoothly with d* but that's
unlikely to be the reality.
>
> It seems to me that a chi-square statistic
which takes into account
> the actual distribution of I/sigma(I)
would be a better bet than the
> bin average, though it's not entirely
clear how one would formulate
> such a metric. One would have to consider
subsets of the data as a
> whole sorted by increasing d* (i.e. not in
resolution bins to avoid
> the 'bin averaging effect' described
above), and apply the resolution
> cut-off where the chi-square statistic has
maximum probability. This
> would automatically take care of
incompleteness effects since all
> unmeasured reflections would be included
with I/sigma = 0 just for the
> purposes of working out the cut-off point.
I've skipped the details
> of implementation and I've no idea how it
would work in practice!
>
> An obvious question is: do we really need
to worry about the exact
> cut-off anyway, won't our sophisticated
maximum likelihood refinement
> programs handle the weak data correctly?
Note that in theory weak
> intensities should be handled correctly,
however the problem may
> instead lie with incorrectly estimated
sigmas: these are obviously
> much more of an issue for any software
which depends critically on
> accurate estimates of uncertainty! I did
some tests where I refined
> data for a known protein-ligand complex
using the original apo model,
> and looked at the difference density for
the ligand, using data cut at
> 2.5, 2 and 1.5 Ang where the standard
metrics strongly suggested there
> was only data to 2.5 Ang.
>
> I have to say that the differences were
tiny, well below what I would
> deem significant (i.e. not only the map
resolutions but all the map
> details were essentially the same), and
certainly I would question
> whether it was worth all the
soul-searching on this topic over the
> years! So it seems that the refinement
programs do indeed handle weak
> data correctly, but I guess this should
hardly come as a surprise (but
> well done to the software developers
anyway!). This was actually
> using Buster: Refmac seems to have more of
a problem with scaling &
> TLS if you include a load of high
resolution junk data. However,
> before anyone acts on this information I
would _very_ strongly advise
> them to repeat the experiment and verify
the results for themselves!
> The bottom line may be that the actual
cut-off used only matters for
> the purpose of quoting the true resolution
of the map, but it doesn't
> significantly affect the appearance of the
map itself.
>
> Finally an effect which confounds all the
quality metrics is data
> anisotropy: ideally the cut-off surface of
significance in reciprocal
> space should perhaps be an ellipsoid, not
a sphere. I know there are
> several programs for anisotropic scaling,
but I'm not aware of any
> that apply anisotropic resolution cutoffs
(or even whether this would
> be advisable).
>
> Cheers
>
> -- Ian
>
> On 27 January 2012 17:47, Jacob Keller
<[email protected]> wrote:
>> Dear Crystallographers,
>>
>> I cannot think why any of the various
flavors of Rmerge/meas/pim
>> should be used as a data cutoff and not
simply I/sigma--can somebody
>> make a good argument or point me to a
good reference? My thinking is
>> that signal:noise of >2 is definitely
still signal, no matter what the
>> R values are. Am I wrong? I was thinking
also possibly the R value
>> cutoff was a historical
accident/expedient from when one tried to
>> limit the amount of data in the face of
limited computational
>> power--true? So perhaps now, when the
computers are so much more
>> powerful, we have the luxury of including
more weak data?
>>
>> JPK
>>
>>
>> --
>>
*******************************************
>> Jacob Pearson Keller
>> Northwestern University
>> Medical Scientist Training Program
>> email: [email protected]
>>
*******************************************
--
ARKA CHAKRABORTY
CAS in Crystallography and Biophysics
University of Madras
Chennai,India
------
Randy J. Read
Department of Haematology, University of
Cambridge
Cambridge Institute for Medical Research
Tel: + 44 1223 336500
Wellcome Trust/MRC Building
Fax: + 44 1223 336827
Hills Road
E-mail: [email protected]
Cambridge CB2 0XY, U.K.
www-structmed.cimr.cam.ac.uk
------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research Tel: +
44 1223 336500
Wellcome Trust/MRC Building Fax: +
44 1223 336827
Hills Road
E-mail: [email protected]
Cambridge CB2 0XY, U.K.
www-structmed.cimr.cam.ac.uk