Bart Hazes wrote >
> There are many cases where people use a structure refined at high
> resolution as a starting molecular replacement structure for a closely
> related/same protein with a lower resolution data set and get substantially
> better R statistics than you would expect for that resolution. So one factor
> in the "R factor gap" is many small errors that are introduced during model
> building and not recognized and fixed later due to limited resolution. In a
> perfect world, refinement would find the global minimum but in practice all
> these little errors get stuck in local minima with distortions in neighboring
> atoms compensating for the initial error and thereby hiding their existence.
Excellent point.
On Thursday, October 28, 2010 02:49:11 pm Jacob Keller wrote:
> So let's say I take a 0.6 Ang structure, artificially introduce noise into
> corresponding Fobs to make the resolution go down to 2 Ang, and refine using
> the 0.6 Ang model--do I actually get R's better than the
> artificially-inflated sigmas?
> Or let's say I experimentally decrease I/sigma by attenuating the beam and
> collect another data set--same situation?
This I can answer based on experience. One can take the coordinates from a
structure
refined at near atomic resolution (~1.0A), including multiple conformations,
partial occupancy waters, etc, and use it to calculate R factors against a lower
resolution (say 2.5A) data set collected from an isomorphous crystal. The
R factors from this total-rigid-body replacement will be better than anything
you
could get from refinement against the lower resolution data. In fact,
refinement
from this starting point will just make the R factors worse.
What this tells us is that the crystallographic residuals can recognize a
better model when they see one. But our refinement programs are not good
enough to produce such a better model in the first place. Worsr, they are not
even good enough to avoid degrading the model.
That's essentially the same thing Bart said, perhaps a little more pessimistic
:-)
cheers,
Ethan
>
> JPK
>
> ----- Original Message -----
> From: Bart Hazes
> To: [email protected]
> Sent: Thursday, October 28, 2010 4:13 PM
> Subject: Re: [ccp4bb] Against Method (R)
>
>
> There are many cases where people use a structure refined at high
> resolution as a starting molecular replacement structure for a closely
> related/same protein with a lower resolution data set and get substantially
> better R statistics than you would expect for that resolution. So one factor
> in the "R factor gap" is many small errors that are introduced during model
> building and not recognized and fixed later due to limited resolution. In a
> perfect world, refinement would find the global minimum but in practice all
> these little errors get stuck in local minima with distortions in neighboring
> atoms compensating for the initial error and thereby hiding their existence.
>
> Bart
>
> On 10-10-28 11:33 AM, James Holton wrote:
> It is important to remember that if you have Gaussian-distributed errors
> and you plot error bars between +1 sigma and -1 sigma (where "sigma" is the
> rms error), then you expect the "right" curve to miss the error bars about
> 30% of the time. This is just a property of the Gaussian distribution: you
> expect a certain small number of the errors to be large. If the curve passes
> within the bounds of every single one of your error bars, then your error
> estimates are either too big, or the errors have a non-Gaussian distribution.
>
>
> For example, if the noise in the data somehow had a uniform distribution
> (always between +1 and -1), then no data point will ever be "kicked" further
> than "1" away from the "right" curve. In this case, a data point more than
> "1" away from the curve is evidence that you either have the wrong model
> (curve), or there is some other kind of noise around (wrong "error model").
>
> As someone who has spent a lot of time looking into how we measure
> intensities, I think I can say with some considerable amount of confidence
> that we are doing a pretty good job of estimating the errors. At least, they
> are certainly not off by an average of 40% (20% in F). You could do better
> than that estimating the intensities by eye!
>
> Everybody seems to have their own favorite explanation for what I call
> the "R factor gap": solvent, multi-confomer structures, absorption effects,
> etc. However, if you go through the literature (old and new) you will find
> countless attempts to include more sophisticated versions of each of these
> hypothetically "important" systematic errors, and in none of these cases has
> anyone ever presented a physically reasonable model that explained the
> observed spot intensities from a protein crystal to within experimental
> error. Or at least, if there is such a paper, I haven't seen it.
>
> Since there are so many possible things to "correct", what I would like
> to find is a structure that represents the transition between the "small
> molecule" and the "macromolecule" world. Lysozyme does not qualify! Even
> the famous 0.6 A structure of lysozyme (2vb1) still has a "mean absolute
> chi": <|Iobs-Icalc|/sig(I)> = 4.5. Also, the 1.4 A structure of the
> tetrapeptide QQNN (2olx) is only a little better at <|chi|> = 3.5. I realize
> that the "chi" I describe here is not a "standard" crystallographic
> statistic, and perhaps I need a statistics lesson, but it seems to me there
> ought to be a case where it is close to 1.
>
> -James Holton
> MAD Scientist
>
>
> On Thu, Oct 28, 2010 at 9:04 AM, Jacob Keller
> <[email protected]> wrote:
>
> So I guess there is never a case in crystallography in which our
> models predict the data to within the errors of data collection? I
> guess the situation might be similar to fitting a Michaelis-Menten
> curve, in which the fitted line often misses the error bars of the
> individual points, but gets the overall pattern right. In that case,
> though, I don't think we say that we are inadequately modelling the
> data. I guess there the error bars are actually too small (are
> underestimated.) Maybe our intensity errors are also underestimated?
>
> JPK
>
>
> On Thu, Oct 28, 2010 at 9:50 AM, George M. Sheldrick
> <[email protected]> wrote:
> >
> > Not quite. I was trying to say that for good small molecule data, R1
> is
> > usally significantly less than Rmerge, but never less than the
> precision
> > of the experimental data measured by 0.5*<sigmaI>/<I> = 0.5*Rsigma
> > (or the very similar 0.5*Rpim).
> >
> > George
> >
> > Prof. George M. Sheldrick FRS
> > Dept. Structural Chemistry,
> > University of Goettingen,
> > Tammannstr. 4,
> > D37077 Goettingen, Germany
> > Tel. +49-551-39-3021 or -3068
> > Fax. +49-551-39-22582
> >
> >
> > On Thu, 28 Oct 2010, Jacob Keller wrote:
> >
> >> So I guess a consequence of what you say is that since in cases
> where there is
> >> no solvent the R values are often better than the precision of the
> actual
> >> measurements (never true with macromolecular crystals involving
> solvent),
> >> perhaps our real problem might be modelling solvent?
> >> Alternatively/additionally, I wonder whether there also might be more
> >> variability molecule-to-molecule in proteins, which we may not model
> well
> >> either.
> >>
> >> JPK
> >>
> >> ----- Original Message ----- From: "George M. Sheldrick"
> >> <[email protected]>
> >> To: <[email protected]>
> >> Sent: Thursday, October 28, 2010 4:05 AM
> >> Subject: Re: [ccp4bb] Against Method (R)
> >>
> >>
> >> > It is instructive to look at what happens for small molecules where
> >> > there is often no solvent to worry about. They are often refined
> >> > using SHELXL, which does indeed print out the weighted R-value
> based
> >> > on intensities (wR2), the conventional unweighted R-value R1 (based
> >> > on F) and <sigmaI>/<I>, which it calls R(sigma). For well-behaved
> >> > crystals R1 is in the range 1-5% and R(merge) (based on
> intensities)
> >> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could be
> regarded
> >> > as the lower attainable limit for R1 and this is indeed the case in
> >> > practice (the factor 0.5 approximately converts from I to F). Rpim
> >> > gives similar results to R(sigma), both attempt to measure the
> >> > precision of the MERGED data, which are what one is refining
> against.
> >> >
> >> > George
> >> >
> >> > Prof. George M. Sheldrick FRS
> >> > Dept. Structural Chemistry,
> >> > University of Goettingen,
> >> > Tammannstr. 4,
> >> > D37077 Goettingen, Germany
> >> > Tel. +49-551-39-3021 or -3068
> >> > Fax. +49-551-39-22582
> >> >
> >> >
> >> > On Wed, 27 Oct 2010, Ed Pozharski wrote:
> >> >
> >> > > On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
> >> > > > the errors in our measurements apparently have no
> >> > > > bearing whatsoever on the errors in our models
> >> > >
> >> > > This would mean there is no point trying to get better crystals,
> right?
> >> > > Or am I also wrong to assume that the dataset with higher
> I/sigma in the
> >> > > highest resolution shell will give me a better model?
> >> > >
> >> > > On a related point - why is Rmerge considered to be the limiting
> value
> >> > > for the R? Isn't Rmerge a poorly defined measure itself that
> >> > > deteriorates at least in some circumstances (e.g. increased
> redundancy)?
> >> > > Specifically, shouldn't "ideal" R approximate 0.5*<sigmaI>/<I>?
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Ed.
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > "I'd jump in myself, if I weren't so good at whistling."
> >> > > Julian, King of Lemurs
> >> > >
> >> > >
> >>
> >>
> >> *******************************************
> >> Jacob Pearson Keller
> >> Northwestern University
> >> Medical Scientist Training Program
> >> Dallos Laboratory
> >> F. Searle 1-240
> >> 2240 Campus Drive
> >> Evanston IL 60208
> >> lab: 847.491.2438
> >> cel: 773.608.9185
> >> email: [email protected]
> >> *******************************************
> >>
> >>
> >
>
>
>
>
>
>
--
Ethan A Merritt
Biomolecular Structure Center, K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742