I am concerned about one basic problem about using sigmaa to judge the
quality of the data and, in particular, data cut-offs: sigmaa depends on
a model! judging data quality by using a model that we derived from is worrying.

sigmaa is by definition the correlation between the (unknown) true structure factor and the calculated structure factor. A low sigmaa value can mean a poor model or poor data or both and no one can differentiate between it. Of course, expertise in looking at maps and its fit to a model can lead one to (subjectively) conclude that a model is as best as it can get and fully refined and a map is to a given resolution and then use a sigmaa plot to decide on data quality and cut offs, but this seems questionable.

and, as eluded to earlier, sigmaa is a parameter that is estimated and there are many ways of doing so: each way, of course, having its own strengths and weaknesses. for example, a likelihood function that uses measurement errors can model disagreement between observed and calculated structure factors (in particular, with large measurement errors that one can/does see at a high resolution limit) and this can lead to an 'optimistic' sigmaa estimate that can be used to incorrectly justify an optimistic data cut-off. given all the problems and instability that can occur with sigmaa estimation, i would think that a better estimate of agreement between observed and calculated structure factor amplitude would be the R-free as a function of resolution and this can be used with I/sigma and Rpim/meas to provide insights to what resolution we can say a model and data are in agreement. Perhaps instead of saying data cutoffs, a better way of saying it is to what resolution is our model and data in agreement...

On Tue, 31 Jan 2012, Phoebe Rice wrote:

I'm enjoying this discussion.
It also seems like a good spot to inject my standard plea for better treatment of 
anisotropy in things like "table 1" of papers and PDB deposition forms.  When 
you have Ewald's football (American football), like many nucleic acid-ophiles do, one 
number simply isn't enough to describe the data.

=====================================
Phoebe A. Rice
Dept. of Biochemistry & Molecular Biology
The University of Chicago
phone 773 834 1723
http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123
http://www.rsc.org/shop/books/2008/9780854042722.asp


---- Original message ----
Date: Tue, 31 Jan 2012 09:28:49 +0000
From: CCP4 bulletin board <[email protected]> (on behalf of Randy Read 
<[email protected]>)
Subject: Re: [ccp4bb] Reasoning for Rmeas or Rpim as Cutoff
To: [email protected]

  Hi Frank,
  Now that I've been forced to recalibrate my measure
  of "old literature" (apparently it's not just
  literature dating from before you started your
  PhD)...
  This idea has been used occasionally, but I think it
  might be becoming more relevant as more structures
  are done at low resolution.  To be honest, my
  criteria for resolution tend to be flexible -- if a
  crystal diffracts to 2.2A resolution by the
  I/sig(I)>2 criterion, there's less incentive to
  worry about it -- pushing it to 2A probably won't
  make much difference to the biological questions
  that can be answered.  But if the 2-sigma criterion
  says that the crystal diffracts to 3.3A, then I'm
  much more likely to see how much more can
  justifiably be squeezed out of it.  Actually, I'm
  more inclined to start from a 1-sigma cutoff in the
  first instance, trusting the maximum likelihood
  methods to deal appropriately with the uncertainty.
  It's not too hard to do the higher-resolution
  cross-validation, but there are a number of things
  to worry about.  First, I remember being told that
  data processing programs will do a better job of
  learning the profiles if you only give them data to
  a resolution where there are real spots, so you
  probably don't always want to integrate to an
  arbitrarily high resolution, unless you're willing
  to go back to the integration after reassessing the
  resolution cutoff.  Maybe, as a standard protocol,
  one could integrate to a conservative resolution and
  a much higher resolution, then use the conservative
  data set for initial work and the higher resolution
  data set for evaluating the optimal resolution
  cutoff -- and then reintegrate one more time later
  at that resolution, using those data for the rest of
  the structure determination.
  The idea of using the SigmaA curve from Refmac has
  come up, but SigmaA curves from cross-validation
  data will have a problem.  In order to get these to
  behave (with a small number of reflections per
  resolution bin), you need to smooth the curve in
  some way.  Refmac does this by fitting a functional
  form, so the high-resolution SigmaA values are bound
  to drop off smoothly regardless of the real
  structure factor agreement.  If you're evaluating
  resolution immediately after molecular replacement
  with a good model, then you could use my old SIGMAA
  program to get independent SigmaA values for
  individual resolution bins, using all the data
  (because there's no danger of over-fitting).
   However, if you start out with a poor model or
  solve the structure by experimental phasing, you'll
  have to do some building and refinement before you
  have a model good enough to compare with the
  higher-resolution data.  Then you want to compare
  the fit of the cross-validation data up to the
  resolution cutoff used in refinement to the
  resolution-dependent fit of all the higher
  resolution data not used in refinement.  I'd
  probably do that, at the moment, by using sftools to
  select all the data that haven't been used in
  refinement then calculate correlation coefficients
  in resolution bins (which are probably as good for
  this purpose as SigmaA values).  (For
  non-aficionados of sftools, the selection could be
  done by selecting the reflections with d-spacing
  less than dmin for your refinement, selecting the
  subset of those that are in the working set, then
  inverting the selection to get everything not used
  in refinement.)
  Regards,
  Randy
  On 30 Jan 2012, at 10:03, Frank von Delft wrote:

    Hi Randy - thank you for a very interesting
    reminder to old literature.

    I'm intrigued:  how come this apparently excellent
    idea has not become standard best practice in the
    14 years since it was published?

    phx

    On 30/01/2012 09:40, Randy Read wrote:

      Hi,
      Here are a couple of links on the idea of
      judging resolution by a type of cross-validation
      with data not used in refinement:
      Ling et al,
      1998: http://pubs.acs.org/doi/full/10.1021/bi971806n
      Brunger et al,
      2008: http://journals.iucr.org/d/issues/2009/02/00/ba5131/index.html
        (cites earlier relevant papers from Brunger's
      group)
      Best wishes,
      Randy Read
      On 30 Jan 2012, at 07:09, arka chakraborty
      wrote:

        Hi all,

        In the context of the above going discussion
        can anybody post links for a few relevant
        articles?

        Thanks in advance,

        ARKO

        On Mon, Jan 30, 2012 at 3:05 AM, Randy Read
        <[email protected]> wrote:

          Just one thing to add to that very detailed
          response from Ian.

          We've tended to use a slightly different
          approach to determining a sensible
          resolution cutoff, where we judge whether
          there's useful information in the highest
          resolution data by whether it agrees with
          calculated structure factors computed from a
          model that hasn't been refined against those
          data.  We first did this with the complex of
          the Shiga-like toxin B-subunit pentamer with
          the Gb3 trisaccharide (Ling et al, 1998).
           From memory, the point where the average
          I/sig(I) drops below 2 was around 3.3A.
           However, we had a good molecular
          replacement model to solve this structure
          and, after just carrying out rigid-body
          refinement, we computed a SigmaA plot using
          data to the edge of the detector (somewhere
          around 2.7A, again from memory).  The SigmaA
          plot dropped off smoothly to 2.8A
          resolution, with values well above zero
          (indicating significantly better than random
          agreement), then dropped suddenly.  So we
          chose 2.8A as the cutoff.  Because there
          were four pentamers in the asymmetric unit,
          we could then use 20-fold NCS averaging,
          which gave a fantastic map.  In this case,
          the averaging certainly helped to pull out
          something very useful from a very weak
          signal, because the maps weren't nearly as
          clear at lower resolution.

          Since then, a number of other people have
          applied similar tests.  Notably, Axel
          Brunger has done some careful analysis to
          show that it can indeed be useful to take
          data beyond the conventional limits.

          When you don't have a great MR model, you
          can do something similar by limiting the
          resolution for the initial refinement and
          rebuilding, then assessing whether there's
          useful information at higher resolution by
          using the improved model (which hasn't seen
          the higher resolution data) to compute
          Fcalcs.  By the way, it's not necessary to
          use a SigmaA plot -- the correlation between
          Fo and Fc probably works just as well.  Note
          that, when the model has been refined
          against the lower resolution data, you'll
          expect a drop in correlation at the
          resolution cutoff you used for refinement,
          unless you only use the cross-validation
          data for the resolution range used in
          refinement.

          -----
          Randy J. Read
          Department of Haematology, University of
          Cambridge
          Cambridge Institute for Medical Research
           Tel: +44 1223 336500
          Wellcome Trust/MRC Building
                  Fax: +44 1223 336827
          Hills Road
                                   E-mail:
          [email protected]
          Cambridge CB2 0XY, U.K.
                    www-structmed.cimr.cam.ac.uk
          On 29 Jan 2012, at 17:25, Ian Tickle wrote:

         > Jacob, here's my (personal) take on this:
         >
         > The data quality metrics that everyone
          uses clearly fall into 2
         > classes: 'consistency' metrics, i.e.
          Rmerge/meas/pim and CC(1/2) which
         > measure how well redundant observations
          agree, and signal/noise ratio
         > metrics, i.e. mean(I/sigma) and
          completeness, which relate to the
         > information content of the data.
         >
         > IMO the basic problem with all the
          consistency metrics is that they
         > are not measuring the quantity that is
          relevant to refinement and
         > electron density maps, namely the
          information content of the data, at
         > least not in a direct and meaningful way.
           This is because there are 2
         > contributors to any consistency metric:
          the systematic errors (e.g.
         > differences in illuminated volume and
          absorption) and the random
         > errors (from counting statistics, detector
          noise etc.).  If the data
         > are collected with sufficient redundancy
          the systematic errors should
         > hopefully largely cancel, and therefore
          only the random errors will
         > determine the information content.
           Therefore the systematic error
         > component of the consistency measure
          (which I suspect is the biggest
         > component, at least for the strong
          reflections) is not relevant to
         > measuring the information content.  If the
          consistency measure only
         > took into account the random error
          component (which it can't), then it
         > would be essentially be a measure of
          information content, if only
         > indirectly (but then why not simply use a
          direct measure such as the
         > signal/noise ratio?).
         >
         > There are clearly at least 2 distinct
          problems with Rmerge, first it's
         > including systematic errors in its measure
          of consistency, second it's
         > not invariant with respect to the
          redundancy (and third it's useless
         > as a statistic anyway because you can't do
          any significance tests on
         > it!).  The redundancy problem is fixed to
          some extent with Rpim etc,
         > but that still leaves the other problems.
           It's not clear to me that
         > CC(1/2) is any better in this respect,
          since (as far as I understand
         > how it's implemented), one cannot be sure
          that the systematic errors
         > will cancel for each half-dataset Imean,
          so it's still likely to
         > contain a large contribution from the
          irrelevant systematic error
         > component and so mislead in respect of the
          real data quality exactly
         > in the same way that Rmerge/meas/pim do.
           One may as well use the
         > Rmerge between the half dataset Imeans,
          since there would be no
         > redundancy effect (i.e. the redundancy
          would be 2 for all included
         > reflections).
         >
         > I did some significance tests on CC(1/2)
          and I got silly results, for
         > example it says that the significance
          level for the CC is ~ 0.1, but
         > this corresponded to a huge Rmerge (200%)
          and a tiny mean(I/sigma)
         > (0.4).  It seems that (without any basis
          in statistics whatsoever) the
         > rule-of-thumb CC > 0.5 is what is
          generally used, but I would be
         > worried that the statistics are so far
          divorced from the reality - it
         > suggests that something is seriously wrong
          with the assumptions!
         >
         > Having said all that, the mean(I/sigma)
          metric, which on the face of
         > it is much more closely related to the
          information content and
         > therefore should be a more relevant metric
          than Rmerge/meas/pim &
         > CC(1/2), is not without its own problems
          (which probably explains the
         > continuing popularity of the other
          metrics!).  First and most obvious,
         > it's a hostage to the estimate of sigma(I)
          used.  I've never been
         > happy with inflating the counting sigmas
          to include effects of
         > systematic error based on the consistency
          of redundant measurements,
         > since as I indicated above if the data are
          collected redundantly in
         > such a way that the systematic errors
          largely cancel, it implies that
         > the systematic errors should not be
          included in the estimate of sigma.
         > The fact that then the sigma(I)'s would
          generally be smaller (at
         > least for the large I's), so the sample
          variances would be much larger
         > than the counting variances, is
          irrelevant, because the former
         > includes the systematic errors.  Also the
          I/sigma cut-off used would
         > probably not need to be changed since it
          affects only the weakest
         > reflections which are largely unaffected
          by the systematic error
         > correction.
         >
         > The second problem with mean(I/sigma) is
          also obvious: i.e. it's a
         > mean, and as such it's rather insensitive
          to the actual distribution
         > of I/sigma(I).  For example if a shell
          contained a few highly
         > significant intensities these could be
          overwhelmed by a large number
         > of weak data and give an insignificant
          mean(I/sigma).  It seems to me
         > that one should be considering the
          significance of individual
         > reflections, not the shell averages.  Also
          the average will depend on
         > the width of the resolution bin, so one
          will get the strange effect
         > that the apparent resolution will depend
          on how one bins at the data!
         > The assumption being made in taking the
          bin average is that I/sigma(I)
         > falls off smoothly with d* but that's
          unlikely to be the reality.
         >
         > It seems to me that a chi-square statistic
          which takes into account
         > the actual distribution of I/sigma(I)
          would be a better bet than the
         > bin average, though it's not entirely
          clear how one would formulate
         > such a metric.  One would have to consider
          subsets of the data as a
         > whole sorted by increasing d* (i.e. not in
          resolution bins to avoid
         > the 'bin averaging effect' described
          above), and apply the resolution
         > cut-off where the chi-square statistic has
          maximum probability.  This
         > would automatically take care of
          incompleteness effects since all
         > unmeasured reflections would be included
          with I/sigma = 0 just for the
         > purposes of working out the cut-off point.
           I've skipped the details
         > of implementation and I've no idea how it
          would work in practice!
         >
         > An obvious question is: do we really need
          to worry about the exact
         > cut-off anyway, won't our sophisticated
          maximum likelihood refinement
         > programs handle the weak data correctly?
           Note that in theory weak
         > intensities should be handled correctly,
          however the problem may
         > instead lie with incorrectly estimated
          sigmas: these are obviously
         > much more of an issue for any software
          which depends critically on
         > accurate estimates of uncertainty!  I did
          some tests where I refined
         > data for a known protein-ligand complex
          using the original apo model,
         > and looked at the difference density for
          the ligand, using data cut at
         > 2.5, 2 and 1.5 Ang where the standard
          metrics strongly suggested there
         > was only data to 2.5 Ang.
         >
         > I have to say that the differences were
          tiny, well below what I would
         > deem significant (i.e. not only the map
          resolutions but all the map
         > details were essentially the same), and
          certainly I would question
         > whether it was worth all the
          soul-searching on this topic over the
         > years!  So it seems that the refinement
          programs do indeed handle weak
         > data correctly, but I guess this should
          hardly come as a surprise (but
         > well done to the software developers
          anyway!).  This was actually
         > using Buster: Refmac seems to have more of
          a problem with scaling &
         > TLS if you include a load of high
          resolution junk data.  However,
         > before anyone acts on this information I
          would _very_ strongly advise
         > them to repeat the experiment and verify
          the results for themselves!
         > The bottom line may be that the actual
          cut-off used only matters for
         > the purpose of quoting the true resolution
          of the map, but it doesn't
         > significantly affect the appearance of the
          map itself.
         >
         > Finally an effect which confounds all the
          quality metrics is data
         > anisotropy: ideally the cut-off surface of
          significance in reciprocal
         > space should perhaps be an ellipsoid, not
          a sphere.  I know there are
         > several programs for anisotropic scaling,
          but I'm not aware of any
         > that apply anisotropic resolution cutoffs
          (or even whether this would
         > be advisable).
         >
         > Cheers
         >
         > -- Ian
         >
         > On 27 January 2012 17:47, Jacob Keller
          <[email protected]> wrote:
         >> Dear Crystallographers,
         >>
         >> I cannot think why any of the various
          flavors of Rmerge/meas/pim
         >> should be used as a data cutoff and not
          simply I/sigma--can somebody
         >> make a good argument or point me to a
          good reference? My thinking is
         >> that signal:noise of >2 is definitely
          still signal, no matter what the
         >> R values are. Am I wrong? I was thinking
          also possibly the R value
         >> cutoff was a historical
          accident/expedient from when one tried to
         >> limit the amount of data in the face of
          limited computational
         >> power--true? So perhaps now, when the
          computers are so much more
         >> powerful, we have the luxury of including
          more weak data?
         >>
         >> JPK
         >>
         >>
         >> --
         >>
          *******************************************
         >> Jacob Pearson Keller
         >> Northwestern University
         >> Medical Scientist Training Program
         >> email: [email protected]
         >>
          *******************************************

        --

        ARKA CHAKRABORTY
        CAS in Crystallography and Biophysics
        University of Madras
        Chennai,India

      ------
      Randy J. Read
      Department of Haematology, University of
      Cambridge
      Cambridge Institute for Medical Research
       Tel: + 44 1223 336500
      Wellcome Trust/MRC Building
      Fax: + 44 1223 336827
      Hills Road
       E-mail: [email protected]
      Cambridge CB2 0XY, U.K.
      www-structmed.cimr.cam.ac.uk

  ------
  Randy J. Read
  Department of Haematology, University of Cambridge
  Cambridge Institute for Medical Research      Tel: +
  44 1223 336500
  Wellcome Trust/MRC Building                   Fax: +
  44 1223 336827
  Hills Road
   E-mail: [email protected]
  Cambridge CB2 0XY, U.K.
  www-structmed.cimr.cam.ac.uk

Reply via email to