Re: [ccp4bb] Highest shell standards

Santarsiero, Bernard D. Thu, 22 Mar 2007 04:49:53 -0800

There are journals that have specific specifications for these parameters,
so it matters where you publish. I've seen restrictions that the highest
resolution shell has to have I/sig > 2 and completeness > 90%. Your
mileage may vary.

I typically process my data to a maximum I/sig near 1, and completeness in
the highest resolution shell to 50% or greater. It's reasonable to expect
the multiplicity/redundancy to be greater than 2, though that is difficult
with the lower symmetry space groups in triclinc and monoclinic systems
(depending upon crystal orientation and detector geometry). The chi^2's
should be relatively uniform over the entire resolution range, near 1 in
the highest resolution bins, and near 1 overall. With this set of
criteria, R(merge)/R(sym) (on I) can be as high as 20% and near 100% for
the highest resolution shell. R is a poor descriptor when you have a
substantial number of weak intensities because it is dominated by the
denominator; chi^2's are a better descriptor since it has, essentially,
the same numerator.

One should also note that the I/sig criteria can be misleading. It is the
*average* of the I/sig in a resolution shell, and as such, will include
intensities that are both weaker and stronger than the average. For the
highest resolution shell, if you discard intensities greater than 2sig,
then you are also discarding intensities substantially greater than 2sig
as well. The natural falloff of the intensities is reflected (no pun
intended) by the average B-factor of the structure, and you need the
higher resolution, weaker data to best define that parameter.

Protein diffraction data is inherently weak, and far weaker than we obtain
for small molecule crystals. Generally, we need all the data we can get,
and the dynamic range of the data that we do get is smaller than that
observed for small molecule crystals. That's why we use restraints in
refinement. An observation of a weak intensity is just as valid as the
observation of a strong observation, since you are minimizing a function
related to matching Iobs to Icalc. This is even more valid with refinement
targets like the maximum likelihood function. The ONLY reasons that we
ever used I/sig or F/sig cutoffs in refinements was to make the
calculations faster (since we were substantially limited by computing
power decades ago), the sig's were not well-defined for weak intensities
(especially for F's), and the detectors were not as sensitive. Now, with
high brilliance x-ray sources and modern detectors, you can, in fact,
measure weak intensities well--far better than we could decades ago. And
while the dynamic range of intensities for a protein set is relatively
flat, in comparison to a small molecule dataset, those weak terms near
zero are important in restraining the Fcalc's to be small, and therefore
helping to define the phases properly.

In 2007, I don't see a valid argument of severe cutoff's in I/sig at the
processing stage. I/sig = 1 and a reasonable completeness of 30-50% in the
highest resolution shell should be adequate to include most of the useful
data. Later on, during refinement, you can, indeed, increase the
resolution limit, if you wish. Again, with targets like maximum
likelihood, there is no statistical reason to do that. You do it because
it makes the R(cryst), R(free), and FOM look better. You do it because you
want to have a 2.00A vs. 1.96A resolution structure. What is always true
is that you need to look at the maps, and they need as many terms in the
Fourier summation as you can include. There should never be an argument
that you're savings on computing cycles. It's takes far longer to look
carefully at an electron density map and make decisions on what to do than
to carry out refinement. We're rarely talking about twice the computing
time, we're probably thinking 10% more. That's definately not a reason to
throw out data. We've got lots of computing power and lots of disk
storage, let's use to our advantage.

That's my nickel.

Bernie Santarsiero

On Thu, March 22, 2007 7:00 am, Ranvir Singh wrote:
> I will agree with Ulrich. Even at 3.0 A, it is
> possible to have a  structure with reasonable accuracy
> which can explain the biological function/ or is
> consistent with available biochemical data.
> Ranvir
> --- Ulrich Genick <[EMAIL PROTECTED]> wrote:
>
>> Here are my 2-3 cents worth on the topic:
>>
>> The first thing to keep in mind is that the goal of
>> a structure
>> determination
>> is not to get the best stats or to claim the highest
>> possible
>> resolution.
>> The goal is to get the best possible structure and
>> to be confident that
>> observed features in a structure are real and not
>> the result of noise.
>>
>>  From that perspective, if any of the conclusions
>> one draws from a
>> structure
>> change depending on whether one includes data with
>> an I/sigI in the
>> highest
>> resolution shell of 2 or 1, one probably treads on
>> thin ice.
>>
>> The general guide that one should include only data,
>> for which the
>> shell's average
>>   I/sigI > 2 comes from the following simple
>> consideration.
>>
>>
>> F/sigF = 2 I/sigI
>>
>> So if you include data with an I/sigI of 2 then your
>> F/sigF =4.  In
>> other words you will
>> have a roughly 25% experimental uncertainty in your
>> F.
>> Now assume that you actually knew the structure of
>> your protein and
>> you would
>> calculate the crystallographic R-factor between the
>> Fcalcs from your
>> true structure and the
>> observed F.
>> In this situation, you would expect to get a
>> crystallographic R-
>> factor around 25%,
>> simply because of the average error in your
>> experimental structure
>> factor.
>> Since most macromolecular structures have R-factors
>> around 20%, it
>> makes little
>> sense to include data, where the experimental
>> uncertainty alone will
>> guarantee that your R-factor will be worse.
>> Of course, these days maximum-likely-hood refinement
>> will just down
>> weight
>> such data and all you do is to burn CPU cycles.
>>
>>
>> If you actually want to do a semi rigorous test of
>> where you should stop
>> including data, simply include increasingly higher
>> resolution data in
>> your
>> refinement and see if your structure improves.
>> If you have really high resolution data (i.e.
>> better than 1.2 Angstrom)
>> you can do matrix inversion in SHELX and get
>> estimated standard
>> deviations (esd)
>> for your refined parameters. As you include more and
>> more data the
>> esds should
>> initially decrease. Simply keep including higher
>> resolution data
>> until your esds
>> start to increase again.
>>
>> Similarly, for lower resolution data you can monitor
>> some molecular
>> parameters, which are not
>> included in the stereochemical restraints and see,
>> if the inclusion
>> of higher-resolution data makes the
>> agreement between the observed and expected
>> parameters better. For
>> example SHELX does not
>> restrain torsion angles in aliphatic portions of
>> side chains. If your
>> structure improves, those
>> angles should cluster more tightly around +60 -60
>> and 180...
>>
>>
>>
>>
>> Cheers,
>>
>> Ulrich
>>
>>
>> > Could someone point me to some standards for data
>> quality,
>> > especially for publishing structures? I'm
>> wondering in particular
>> > about highest shell completeness, multiplicity,
>> sigma and Rmerge.
>> >
>> > A co-worker pointed me to a '97 article by
>> Kleywegt and Jones:
>> >
>> > http://xray.bmc.uu.se/gerard/gmrp/gmrp.html
>> >
>> > "To decide at which shell to cut off the
>> resolution, we nowadays
>> > tend to use the following criteria for the highest
>> shell:
>> > completeness > 80 %, multiplicity > 2, more than
>> 60 % of the
>> > reflections with I > 3 sigma(I), and Rmerge < 40
>> %. In our opinion,
>> > it is better to have a good 1.8 Å structure, than
>> a poor 1.637 Å
>> > structure."
>> >
>> > Are these recommendations still valid with maximum
>> likelihood
>> > methods? We tend to use more data, especially in
>> terms of the
>> > Rmerge and sigma cuttoff.
>> >
>> > Thanks in advance,
>> >
>> > Shane Atwell
>> >
>>
>
>
>
>
> ____________________________________________________________________________________
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
> http://tv.yahoo.com/
>

Re: [ccp4bb] Highest shell standards

Reply via email to