I'd like to comment on Kay's statement, "People using a cutoff of 2 (or 3, or 1) for
the mean I/sigI are just using an arbitrary number, as if it were magic."
That may be true for values of 1 or 3, but 45 years ago, I was told by Lyle
Jensen why a 2sigma(I) cutoff was appropriate.
When people obtained reflection intensities from photographic film, some of the
reflections were "unobserved" because they were below the cloudiness of the
film. When diffractometers came into use with scintillation counters, measurements could
be made for all reflections.
So, if you had a refined structure based on photographic data, how could you
compare it and its data set obtained from a diffractometer?
It was determined that a 2sigma(I) cutoff corresponded to the "unobserved"
level from film data.
I don't know how quantitative this determination was, but it wasn't exactly
"arbitrary".
Ron
On Sat, 28 Oct 2017, Kay Diederichs wrote:
The ideas was to cut all datasets at say 30% CC1/2 to see how they differ in
resolution I/sigI etc. for that given CC1/2 …
not sure which insight that would give you. CC1/2 and mean I/sigI of the merged data are
related quantities; that relation is given in (1). The formula given in "Box 1"
of that paper shows that a CC1/2 of 20% corresponds to an average I/sigI of the merged
data around 1, and 30% corresponds to about 1.3 .
The advantage of CC1/2 over mean I/sigI is that the sigmas are not required.
Sigmas are difficult to get right, or even consistent, and different programs
result in different sigmas for the same data.
Furthermore, correlation coefficients have known statistical properties, e.g. their
"significance" (the probability of a given value, or higher, arising by chance) can be
calculated. If that "significance" has a low numerical value (e.g. 0.1%) then you may
conclude that this value is due to signal in your data. In this example, only in (statistically) 1
out of 1000 cases you would _wrongly_ conclude that there is signal.
Whether a correlation coefficient is significant at a given "significance level" (e.g.
0.1% which is the value that results in a "*" appended to the numerical value in
CORRECT.LP and XSCALE.LP) depends on its numerical value, and the number of unique reflections it
is based upon. There is thus no fixed cutoff. BTW no such insight is available for the mean I/sigI
of the merged data.
People using a cutoff of 2 (or 3, or 1) for the mean I/sigI are just using an arbitrary
number, as if it were magic. As long as it is "significant", the same goes for
a CC1/2 cutoff of 20% or 30% or ... it is arbitrary. CC1/2 = 14.3% is the value where the
correlation of the merged intensities with the (unknown) true intensities can be expected
to be 50% - this is just to put the numbers into perspective, and is not to be used as a
cutoff.
For refinement, there is no "best" cutoff that always works. It depends on the accuracy
of the model whether it can extract information from the weak intensities in the high-resolution
data. There is a useful test called "paired refinement" that helps finding out if the
weak data really improve the model, or not. It is rather simple to apply that test (PDB_REDO does
it in an automated way) but its outcome depends on both the accuracy of the data, and the accuracy
of the model.
It is safe to err on the side of "too optimistic" high-resolution cutoff because there is
no degradation of the model when using those data. But to cut "too low" may mean missing
the opportunity to get a better model.
One insight (Garib Murshudov) is that if the R/Rfree of your model in the
high-resolution shell is >42% (assuming no twinning or tNCS) then that matches
what would be obtained by refinement of the correct model against constant
intensities (as derived from the Wilson plot) - an indication that one should
rather not use the data beyond this resolution for refinement, or that the model
has significant errors.
Hope this helps,
Kay
(1) Karplus, P.A., Diederichs, K. (2015) Assessing and maximizing data quality
in macromolecular crystallography. Curr. Opin. Struct. Biol. 34, 60-68; online
at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684713