I'd like to comment on Kay's statement, "People using a cutoff of 2 (or 3, or 1) for 
the mean I/sigI are just using an arbitrary number, as if it were magic."

That may be true for values of 1 or 3, but 45 years ago, I was told by Lyle 
Jensen why a 2sigma(I) cutoff was appropriate.

When people obtained reflection intensities from photographic film, some of the 
reflections were "unobserved" because they were below the cloudiness of the 
film.  When diffractometers came into use with scintillation counters, measurements could 
be made for all reflections.

So, if you had a refined structure based on photographic data, how could you 
compare it and its data set obtained from a diffractometer?

It was determined that a 2sigma(I) cutoff corresponded to the "unobserved" 
level from film data.

I don't know how quantitative this determination was, but it wasn't exactly 
"arbitrary".

Ron

On Sat, 28 Oct 2017, Kay Diederichs wrote:

The ideas was to cut all datasets at say 30% CC1/2 to see how they differ in 
resolution I/sigI etc. for that given CC1/2 …

not sure which insight that would give you. CC1/2 and mean I/sigI of the merged data are 
related quantities; that relation is given in (1). The formula given in "Box 1" 
of that paper shows that a CC1/2 of 20% corresponds to an average I/sigI of the merged 
data around 1, and 30% corresponds to about 1.3 .

The advantage of CC1/2 over mean I/sigI is that the sigmas are not required. 
Sigmas are difficult to get right, or even consistent, and different programs 
result in different sigmas for the same data.

Furthermore, correlation coefficients have known statistical properties, e.g. their 
"significance" (the probability of a given value, or higher, arising by chance) can be 
calculated. If that "significance" has a low numerical value (e.g. 0.1%) then you may 
conclude that this value is due to signal in your data. In this example, only in (statistically) 1 
out of 1000 cases you would _wrongly_ conclude that there is signal.

Whether a correlation coefficient is significant at a given "significance level" (e.g. 
0.1% which is the value that results in a "*" appended to the numerical value in 
CORRECT.LP and XSCALE.LP) depends on its numerical value, and the number of unique reflections it 
is based upon. There is thus no fixed cutoff. BTW no such insight is available for the mean I/sigI 
of the merged data.

People using a cutoff of 2 (or 3, or 1) for the mean I/sigI are just using an arbitrary 
number, as if it were magic. As long as it is "significant", the same goes for 
a CC1/2 cutoff of 20% or 30% or ... it is arbitrary. CC1/2 = 14.3% is the value where the 
correlation of the merged intensities with the (unknown) true intensities can be expected 
to be 50% - this is just to put the numbers into perspective, and is not to be used as a 
cutoff.

For refinement, there is no "best" cutoff that always works. It depends on the accuracy 
of the model whether it can extract information from the weak intensities in the high-resolution 
data. There is a useful test called "paired refinement" that helps finding out if the 
weak data really improve the model, or not. It is rather simple to apply that test (PDB_REDO does 
it in an automated way) but its outcome depends on both the accuracy of the data, and the accuracy 
of the model.

It is safe to err on the side of "too optimistic" high-resolution cutoff because there is 
no degradation of the model when using those data. But to cut "too low" may mean missing 
the opportunity to get a better model.

One insight (Garib Murshudov) is that if the R/Rfree of your model in the 
high-resolution shell is >42% (assuming no twinning or tNCS) then that matches 
what would be obtained by refinement of the correct model against constant 
intensities (as derived from the Wilson plot) - an indication that one should 
rather not use the data beyond this resolution for refinement, or that the model 
has significant errors.

Hope this helps,

Kay

(1) Karplus, P.A., Diederichs, K. (2015) Assessing and maximizing data quality 
in macromolecular crystallography. Curr. Opin. Struct. Biol. 34, 60-68; online 
at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684713

Reply via email to