Re: [ccp4bb] @Phil:Death of Rmerge

aaleshin Sat, 02 Jun 2012 23:01:57 -0700

Could you please give me a reference to the "K & D paper"? Without reading it, 
I do not see a problem with Rmerge going to infinity in high resolution shells. 
Indeed, I was taught at school that the crystallographic resolution is defined 
as a minimal distance between two peaks that can be distinguished in the 
electron density map. I was also taught that under "normal conditions" this 
would occur when the data are collected up to the shell, in which Rmerge = 0.5. 
One can collect more data (up to Rmerge=1.0 or even 100) but the resolution of 
the electron density map will not change significantly.


I solved several structures of my own, and this simple rule worked every time. 
It failed only when the diffraction was very anisotropic, because the 
resolution was not uniform in all directions.  But this obstacle can be easily 
overcome by presenting the resolution as a tensor with eigenvalues  defined in 
the same simple rule (Rmerge = 0.5).

Now, why such a simple method for estimation of the data resolution should be 
abandoned? Is <I/sigI>  a much better criterion than Rmerge?  Lets look at the 
definitions:

I is measured as a number of detector counts in the reflection minus background 
counts. 
sigI is measured as sq. root of I plus standard deviation (SD) for the 
background plus various deviations from ideal experiment (like noise from 
satellite crystals). 
Obviously, sigI cannot be measured accurately. Moreover, the 'resolution' is 
related to errors in the structural factors, which are  average from several 
measurements. Errors in their scaling would affect the 'resolution', and 
<I/sigI> does not detect them, but Rmerge does!

Rmerge =  < (I - <I>) / n*<I> > where n is the number of measurements for the 
same structural factor (data redundancy). When n -> infinity, 
Rmerge = < sigI/I >. From my experience, redundancy = 3-4 gives a very good 
agreement between Rmerge and <sigI/I>. If <sigI/I> is significantly lower than 
Rmerge, it means that the symmetry related reflections did not merge well. 
Under those conditions, Rmerge becomes a much better criterion for estimation 
of the 'resolution'  than <sigi/I>.

I AGREE THAT Rmerge=0.5 SHOULD NOT BE A CRITERION FOR DATA TRUNCATION. But we 
need a commonly accepted criterion to estimate the resolution, and Rmerge=0.5 
is tested by the time. If someone decides to use <I/sigI> instead of Rmerge, 
fine, let it be 2.0.  I do not know how it translates into CC...  
Alternatively, the resolution could be estimated from the electron density 
maps. But we need the commonly accepted rule how to do it, and it should be 
related to the old Rmerge=0.5 rule. 

I hope everyone agrees that the resolution should not be dead..

Alex

 

On Jun 1, 2012, at 11:19 AM, Phil Evans wrote:

> As the K & D paper points out, as the signal/noise declines at higher 
> resolution, Rmerge goes up to infinity, so there is no sensible way to set a 
> limiting value to determine "resolution".
> 
> That is not to say that Rmerge has no use: as you say it's a reasonably good 
> metric to plot against image number to detect a problem. It just not a 
> suitable metric for deciding resolution
> 
> I/sigI is pretty good for this, even though the sigma estimates are not very 
> reliable. CC1/2 is probably better since it is independent of sigmas and has 
> defined values from 1.0 down to 0.0 as signal/noise decreases. But we should 
> be careful of any dogma which says what data we should discard, and what the 
> cutoff limits should be: I/sigI > 3,2, or 1? CC1/2 > 0.2, 0.3, 0.5 ...? 
> Usually it does not make a huge difference, but why discard useful data? 
> Provided the data are properly weighted in refinement by weights 
> incorporating observed sigmas (true in  Refmac, not true in phenix.refine at 
> present I believe), adding extra weak data should do no harm, at least out to 
> some point. Program algorithms are improving in their treatment of weak data, 
> but are by no means perfect.
> 
> One problem as discussed earlier in this thread is that we have got used to 
> the idea that nominal resolution is a single number indicating the quality of 
> a structure, but this has never been true, irrespective of the cutoff method. 
> Apart from the considerable problem of anisotropy, we all need to note the 
> wisdom of Ethan Merritt
> 
>> "We should also encourage people not to confuse the quality of 
>> the data with the quality of the model."
> 
> Phil
> 
> 
> 
> On 1 Jun 2012, at 18:59, aaleshin wrote:
> 
>> Please excuse my ignorance, but I cannot understand why Rmerge is unreliable 
>> for estimation of the resolution?
>> I mean, from a theoretical point of view, <1/sigma> is indeed a better 
>> criterion, but it is not obvious from a practical point of view.
>> 
>> <1/sigma> depends on a method for sigma estimation, and so same data 
>> processed by different programs may have different <1/sigma>. Moreover, 
>> HKL2000 allows users to adjust sigmas manually. Rmerge estimates sigmas from 
>> differences between measurements of same structural factor, and hence is 
>> independent of our preferences.  But, it also has a very important ability 
>> to validate consistency of the merged data. If my crystal changed during the 
>> data collection, or something went wrong with the diffractometer, Rmerge 
>> will show it immediately, but <1/sigma>  will not.
>> 
>> So, please explain why should we stop using Rmerge as a criterion of data 
>> resolution? 
>> 
>> Alex
>> Sanford-Burnham Medical Research Institute
>> 10901 North Torrey Pines Road
>> La Jolla, California 92037
>> 
>> 
>> 
>> On Jun 1, 2012, at 5:07 AM, Ian Tickle wrote:
>> 
>>> On 1 June 2012 03:22, Edward A. Berry <[email protected]> wrote:
>>>> Leo will probably answer better than I can, but I would say I/SigI counts
>>>> only
>>>> the present reflection, so eliminating noise by anisotropic truncation
>>>> should
>>>> improve it, raising the average I/SigI in the last shell.
>>> 
>>> We always include unmeasured reflections with I/sigma(I) = 0 in the
>>> calculation of the mean I/sigma(I) (i.e. we divide the sum of
>>> I/sigma(I) for measureds by the predicted total no of reflections incl
>>> unmeasureds), since for unmeasureds I is (almost) completely unknown
>>> and therefore sigma(I) is effectively infinite (or at least finite but
>>> large since you do have some idea of what range I must fall in).  A
>>> shell with <I/sigma(I)> = 2 and 50% completeness clearly doesn't carry
>>> the same information content as one with the same <I/sigma(I)> and
>>> 100% complete; therefore IMO it's very misleading to quote
>>> <I/sigma(I)> including only the measured reflections.  This also means
>>> we can use a single cut-off criterion (we use mean I/sigma(I) > 1),
>>> and we don't need another arbitrary cut-off criterion for
>>> completeness.  As many others seem to be doing now, we don't use
>>> Rmerge, Rpim etc as criteria to estimate resolution, they're just too
>>> unreliable - Rmerge is indeed dead and buried!
>>> 
>>> Actually a mean value of I/sigma(I) of 2 is highly statistically
>>> significant, i.e. very unlikely to have arisen by chance variations,
>>> and the significance threshold for the mean must be much closer to 1
>>> than to 2.  Taking an average always increases the statistical
>>> significance, therefore it's not valid to compare an _average_ value
>>> of I/sigma(I) = 2 with a _single_ value of I/sigma(I) = 3 (taking 3
>>> sigma as the threshold of statistical significance of an individual
>>> measurement): that's a case of "comparing apples with pears".  In
>>> other words in the outer shell you would need a lot of highly
>>> significant individual values >> 3 to attain an overall average of 2
>>> since the majority of individual values will be < 1.
>>> 
>>>> F/sigF is expected to be better than I/sigI because dx^2 = 2Xdx,
>>>> dx^2/x^2 = 2dx/x, dI/I = 2* dF/F  (or approaches that in the limit . . .)
>>> 
>>> That depends on what you mean by 'better': every metric must be
>>> compared with a criterion appropriate to that metric. So if we are
>>> comparing I/sigma(I) with a criterion value = 3, then we must compare
>>> F/sigma(F) with criterion value = 6 ('in the limit' of zero I), in
>>> which case the comparison is no 'better' (in terms of information
>>> content) with I than with F: they are entirely equivalent.  It's
>>> meaningless to compare F/sigma(F) with the criterion value appropriate
>>> to I/sigma(I): again that's "comparing apples and pears"!
>>> 
>>> Cheers
>>> 
>>> -- Ian

Re: [ccp4bb] @Phil:Death of Rmerge

Reply via email to