Re: [ccp4bb] AW: [ccp4bb] Rmergicide Through Programming

Phil Evans Mon, 10 Jul 2017 08:43:43 -0700

What is the difference between Rmerge and Rsym - I thought they were the same?
Rrim == Rmeas I think


Phil



> On 10 Jul 2017, at 15:18, John Berrisford <j...@ebi.ac.uk> wrote:
> 
> Dear Herman
> 
> The new PDB deposition system (OneDep) allows you to enter values for Rmerge, 
> Rsym, Rpim, Rrim and / or CC half. If, during deposition, you do not provide 
> a value for any of these metrics then we will ask you for a value for one of 
> them.
> 
> Also, PDB format is a legacy format for the PDB. In 2014 mmCIF became the 
> archive format for the PDB and some large entries are no longer distributed 
> in PDB format. mmCIF is not limited by the constraints of punch cards.
> 
> Please see https://www.wwpdb.org/documentation/file-formats-and-the-pdb
> 
> Regards
> 
> John
> 
> PDBe
> 
> 
> 
> On 10/07/2017 09:26, herman.schreu...@sanofi.com wrote:
>> Dear All,
>> 
>> For me this whole discussion is an example of a large number of people 
>> barking at the wrong tree. The real issue is not whether data processing 
>> programs print amongst many quality indicators an Rmerge as well, but the 
>> fact that the PDB and many journals still insist on using the Rmerge as 
>> primary quality indicator. As long as this is true, novice scientist might 
>> be led to believe that Rmerge is the most important quality indicator. As 
>> soon as the PDB and the journals request some other indicator, this will be 
>> over. So that is where we should direct our efforts to.
>> 
>> I don't understand at all, why the PDB still insists on an obsolete quality 
>> indicator. However, the PDB format for the coordinates also dates back to 
>> the 1960's to be used with punch cards.
>> 
>> My 2 cents.
>> Herman
>> 
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] Im Auftrag von 
>> Edward A. Berry
>> Gesendet: Samstag, 8. Juli 2017 22:31
>> An: CCP4BB@JISCMAIL.AC.UK
>> Betreff: Re: [ccp4bb] Rmergicide Through Programming
>> 
>> But R-merge is not really narrower as a fraction of the mean value- it just 
>> gets smaller proportionantly as all the numbers get smaller:
>> RMSD of .0043 for R-meas multiplied by factor of 0.022/.027 gives 0.0035 
>> which is the RMSD for Rmerge. The same was true in the previous example. You 
>> could multiply R-meas by .5 or .2 and get a sharper distribution yet! And 
>> that factor would be constant, where this only applies for super-low 
>> redundancy.
>> 
>> On 07/08/2017 03:23 PM, James Holton wrote:
>>> The expected distribution of Rmeas values is still wider than that of 
>>> Rmerge for data with I/sigma=30 and average multiplicity=2.0. Graph 
>>> attached.
>>> 
>>> I expect that anytime you incorporate more than one source of information 
>>> you run the risk of a noisier statistic because every source of information 
>>> can contain noise.  That is, Rmeas combines information about multiplicity 
>>> with the absolute deviates in the data to form a statistic that is more 
>>> accurate that Rmerge, but also (potentially) less precise.
>>> 
>>> Perhaps that is what we are debating here?  Which is better? accuracy or 
>>> precision?  Personally, I prefer to know both.
>>> 
>>> -James Holton
>>> MAD Scientist
>>> 
>>> On 7/8/2017 11:02 AM, Frank von Delft wrote:
>>>> It is quite easy to end up with low multiplicities in the low resolution 
>>>> shell, especially for low symmetry and fast-decaying crystals.
>>>> 
>>>> It is this scenario where Rmerge (lowres) is more misleading than Reas.
>>>> 
>>>> phx
>>>> 
>>>> 
>>>> On 08/07/2017 17:31, James Holton wrote:
>>>>> What does Rmeas tell us that Rmerge doesn't?  Given that we know the 
>>>>> multiplicity?
>>>>> 
>>>>> -James Holton
>>>>> MAD Scientist
>>>>> 
>>>>> On 7/8/2017 9:15 AM, Frank von Delft wrote:
>>>>>> Anyway, back to reality:  does anybody still use R statistics to 
>>>>>> evaluate anything other than /strong/ data?  Certainly I never look at 
>>>>>> it except for the low-resolution bin (or strongest reflections). 
>>>>>> Specifically, a "2%-dataset" in that bin is probably healthy, while a 
>>>>>> "9%-dataset" probably Has Issues.
>>>>>> 
>>>>>> In which case, back to Jacob's question:  what does Rmerge tell us that 
>>>>>> Rmeas doesn't.
>>>>>> 
>>>>>> phx
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 08/07/2017 17:02, James Holton wrote:
>>>>>>> Sorry for the confusion.  I was going for brevity!  And failed.
>>>>>>> 
>>>>>>> I know that the multiplicity correction is applied on a per-hkl basis 
>>>>>>> in the calculation of Rmeas.  However, the average multiplicity over 
>>>>>>> the whole calculation is most likely not an integer. Some hkls may be 
>>>>>>> observed twice while others only once, or perhaps 3-4 times in the same 
>>>>>>> scaling run.
>>>>>>> 
>>>>>>> Allow me to do the error propagation properly.  Consider the scenario:
>>>>>>> 
>>>>>>> Your outer resolution bin has a true I/sigma = 1.00 and average 
>>>>>>> multiplicity of 2.0. Let's say there are 100 hkl indices in this bin.  
>>>>>>> I choose the "true" intensities of each hkl from an exponential (aka 
>>>>>>> Wilson) distribution. Further assume the background is high, so the 
>>>>>>> error in each observation after background subtraction may be taken 
>>>>>>> from a Gaussian distribution. Let's further choose the per-hkl 
>>>>>>> multiplicity from a Poisson distribution with expectation value 2.0, so 
>>>>>>> 0 is possible, but the long-term average multiplicity is 2.0. For R 
>>>>>>> calculation, when multiplicity of any given hkl is less than 2 it is 
>>>>>>> skipped. What I end up with after 120,000 trials is a distribution of 
>>>>>>> values for each R factor.  See attached graph.
>>>>>>> 
>>>>>>> What I hope is readily apparent is that the distribution of Rmerge
>>>>>>> values is taller and sharper than that of the Rmeas values.  The most 
>>>>>>> likely Rmeas is 80% and that of Rmerge is 64.6%.  This is expected, of 
>>>>>>> course.  But what I hope to impress upon you is that the most likely 
>>>>>>> value is not generally the one that you will get! The distribution has 
>>>>>>> a width.  Specifically, Rmeas could be as low as 40%, or as high as 
>>>>>>> 209%, depending on the trial.  Half of the trial results falling 
>>>>>>> between 71.4% and 90.3%, a range of 19 percentage points.  Rmerge has a 
>>>>>>> middle-half range from 57.6% to 72.9% (15.3 percentage points).  This 
>>>>>>> range of possible values of Rmerge or Rmeas from data with the same 
>>>>>>> intrinsic quality is what I mean when I say "numerical instability".  
>>>>>>> Each and every trial had the same true I/sigma and multiplicity, and 
>>>>>>> yet the R factors I get vary depending on the trial.  Unfortunately for 
>>>>>>> most of us with real data, you only ever get one trial, and you can't 
>>>>>>> predict which Rmeas or Rmerge you'll get.
>>>>>>> 
>>>>>>> My point here is that R statistics in general are not comparable from 
>>>>>>> experiment to experiment when you are looking at data with low average 
>>>>>>> intensity and low multiplicity, and it appears that Rmeas is less 
>>>>>>> stable than Rmerge.  Not by much, mind you, but still jumps around more.
>>>>>>> 
>>>>>>> Hope that is clearer?
>>>>>>> 
>>>>>>> Note that in no way am I suggesting that low-multiplicity is the right 
>>>>>>> way to collect data.  Far from it.  Especially with modern detectors 
>>>>>>> that have negligible read-out noise. But when micro crystals only give 
>>>>>>> off a handful of photons each before they die, low multiplicity might 
>>>>>>> be all you have.
>>>>>>> 
>>>>>>> -James Holton
>>>>>>> MAD Scientist
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/7/2017 2:33 PM, Edward A. Berry wrote:
>>>>>>>> I think the confusion here is that the "multiplicity correction"
>>>>>>>> is applied on each reflection, where it will be an integer 2 or
>>>>>>>> greater (can't estimate variance with only one measurement). You
>>>>>>>> can only correct in an approximate way using using the average
>>>>>>>> multiplicity of the dataset, since it would depend on the distribution 
>>>>>>>> of multiplicity over the reflections.
>>>>>>>> 
>>>>>>>> And the correction is for r-merge. You don't need to apply a
>>>>>>>> correction to R-meas.
>>>>>>>> R-meas is a redundancy-independent best estimate of the variance.
>>>>>>>> Whatever you would have used R-merge for (hopefully taking
>>>>>>>> allowance for the multiplicity) you can use R-meas and not worry about 
>>>>>>>> multiplicity.
>>>>>>>> Again, what information does R-merge provide that R-meas does not
>>>>>>>> provide in a more accurate way?
>>>>>>>> 
>>>>>>>> According to the denso manual, one way to artificially reduce
>>>>>>>> R-merge is to include reflections with only one measure
>>>>>>>> (averaging in a lot of zero's always helps bring an average
>>>>>>>> down), and they say there were actually some programs that did
>>>>>>>> that. However I'm quite sure none of the ones we rely on today do that.
>>>>>>>> 
>>>>>>>> On 07/07/2017 03:12 PM, Kay Diederichs wrote:
>>>>>>>>> James,
>>>>>>>>> 
>>>>>>>>> I cannot follow you. "n approaches 1" can only mean n = 2 because n 
>>>>>>>>> is integer. And for n=2 the sqrt(n/(n-1)) factor is well-defined. For 
>>>>>>>>> n=1, neither contributions to Rmeas nor Rmerge nor to any other 
>>>>>>>>> precision indicator can be calculated anyway, because there's nothing 
>>>>>>>>> this measurement can be compared against.
>>>>>>>>> 
>>>>>>>>> just my 2 cents,
>>>>>>>>> 
>>>>>>>>> Kay
>>>>>>>>> 
>>>>>>>>> On Fri, 7 Jul 2017 10:57:17 -0700, James Holton 
>>>>>>>>> <jmhol...@slac.stanford.edu> wrote:
>>>>>>>>> 
>>>>>>>>>> I happen to be one of those people who think Rmerge is a very
>>>>>>>>>> useful statistic.  Not as a method of evaluating the resolution
>>>>>>>>>> limit, which is mathematically ridiculous, but for a host of
>>>>>>>>>> other important things, like evaluating the performance of data
>>>>>>>>>> collection equipment, and evaluating the isomorphism of different 
>>>>>>>>>> crystals, to name a few.
>>>>>>>>>> 
>>>>>>>>>> I like Rmerge because it is a simple statistic that has a
>>>>>>>>>> simple formula and has not undergone any "corrections".
>>>>>>>>>> Corrections increase complexity, and complexity opens the door
>>>>>>>>>> to manipulation by the desperate and/or misguided.  For
>>>>>>>>>> example, overzealous outlier rejection is a common way to abuse
>>>>>>>>>> R factors, and it is far too often swept under the rug,
>>>>>>>>>> sometimes without the user even knowing about it. This is
>>>>>>>>>> especially problematic when working in a regime where the statistic 
>>>>>>>>>> of interest is unstable, and for R factors this is low intensity 
>>>>>>>>>> data.
>>>>>>>>>> Rejecting just the right "outliers" can make any R factor look
>>>>>>>>>> a lot better.  Why would Rmeas be any more unstable than
>>>>>>>>>> Rmerge? Look at the formula. There is an "n-1" in the
>>>>>>>>>> denominator, where n is the multiplicity.  So, what happens
>>>>>>>>>> when n approaches 1 ? What happens when n=1? This is not to say
>>>>>>>>>> Rmerge is better than Rmeas. In fact, I believe the latter is
>>>>>>>>>> generally superior to the first, unless you are working near n
>>>>>>>>>> = 1. The sqrt(n/(n-1)) is trying to correct for bias in the R
>>>>>>>>>> statistic, but fighting one infinity with another infinity is a 
>>>>>>>>>> dangerous game.
>>>>>>>>>> 
>>>>>>>>>> My point is that neither Rmerge nor Rmeas are easily
>>>>>>>>>> interpreted without knowing the multiplicity.  If you see Rmeas
>>>>>>>>>> = 10% and the multiplicity is 10, then you know what that
>>>>>>>>>> means.  Same for Rmerge, since at n=10 both stats have nearly
>>>>>>>>>> the same value.  But if you have Rmeas = 45% and multiplicity =
>>>>>>>>>> 1.05, what does that mean?  Rmeas will be only 33% if the
>>>>>>>>>> multiplicity is rounded up to 1.1. This is what I mean by
>>>>>>>>>> "numerical instability", the value of the R statistic itself
>>>>>>>>>> becomes sensitive to small amounts of noise, and behaves more
>>>>>>>>>> and more like a random number generator. And if you have Rmeas
>>>>>>>>>> = 33% and no indication of multiplicity, it is hard to know
>>>>>>>>>> what is going on.  I personally am a lot more comfortable
>>>>>>>>>> seeing qualitative agreement between Rmerge and Rmeas, because that 
>>>>>>>>>> means the numerical instability of the multiplicity correction 
>>>>>>>>>> didn't mess anything up.
>>>>>>>>>> 
>>>>>>>>>> Of course, when the intensity is weak R statistics in general
>>>>>>>>>> are not useful.  Both Rmeas and Rmerge have the sum of all
>>>>>>>>>> intensities in the denominator, so when the bin-wide sum
>>>>>>>>>> approaches zero you have another infinity to contend with.
>>>>>>>>>> This one starts to rear its ugly head once I/sigma drops below
>>>>>>>>>> about 3, and this is why our ancestors always applied a sigma
>>>>>>>>>> cutoff before computing an R factor. Our small-molecule
>>>>>>>>>> colleagues still do this!  They call it "R1".  And it is an
>>>>>>>>>> excellent indicator of the overall relative error.  The
>>>>>>>>>> relative error in the outermost bin is not meaningful, and strangely 
>>>>>>>>>> enough nobody ever reported the outer-resolution Rmerge before 1995.
>>>>>>>>>> 
>>>>>>>>>> For weak signals, Correlation Coefficients are better, but for
>>>>>>>>>> strong signals CC pegs out at >95%, making it harder to see relative 
>>>>>>>>>> errors.
>>>>>>>>>> I/sigma is what we'd like to know, but the value of "sigma" is
>>>>>>>>>> still prone to manipulation by not just outlier rejection, but
>>>>>>>>>> massaging the so-called "error model".  Suffice it to say,
>>>>>>>>>> crystallographic data contain more than one type of error.
>>>>>>>>>> Some sources are important for weak spots, others are important
>>>>>>>>>> for strong spots, and still others are only apparent in the
>>>>>>>>>> mid-range.  Some sources of error are only important at low
>>>>>>>>>> multiplicity, and others only manifest at high multiplicity.
>>>>>>>>>> There is no single number that can be used to evaluate all aspects 
>>>>>>>>>> of data quality.
>>>>>>>>>> 
>>>>>>>>>> So, I remain a champion of reporting Rmerge.  Not in the
>>>>>>>>>> high-angle bin, because that is essentially a random number,
>>>>>>>>>> but overall Rmerge and low-angle-bin Rmerge next to
>>>>>>>>>> multiplicity, Rmeas, CC1/2 and other statistics is the only way
>>>>>>>>>> you can glean enough information about where the errors are
>>>>>>>>>> coming from in the data.  Rmeas is a useful addition because it
>>>>>>>>>> helps us correct for multiplicity without having to do math in
>>>>>>>>>> our head.  Users generally thank you for that. Rmerge, however,
>>>>>>>>>> has served us well for more than half a century, and I believe
>>>>>>>>>> Uli Arndt knew what he was doing.  I hope we all know enough
>>>>>>>>>> about history to realize that future generations seldom thank their 
>>>>>>>>>> ancestors for "protecting" them from information.
>>>>>>>>>> 
>>>>>>>>>> -James Holton
>>>>>>>>>> MAD Scientist
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 7/5/2017 10:36 AM, Graeme Winter wrote:
>>>>>>>>>>> Frank,
>>>>>>>>>>> 
>>>>>>>>>>> you are asking me to remove features that I like, so I would feel 
>>>>>>>>>>> that the challenge is for you to prove that this is harmful however:
>>>>>>>>>>> 
>>>>>>>>>>>    - at the minimum, I find it a useful check sum that the stats 
>>>>>>>>>>> are internally consistent (though I interpret it for lots of other 
>>>>>>>>>>> reasons too)
>>>>>>>>>>>    - it is faulty I agree, but (with caveats) still useful
>>>>>>>>>>> IMHO
>>>>>>>>>>> 
>>>>>>>>>>> Sorry for being terse, but I remain to be convinced that
>>>>>>>>>>> removing it increases the amount of information
>>>>>>>>>>> 
>>>>>>>>>>> CC’ing BB as requested
>>>>>>>>>>> 
>>>>>>>>>>> Best wishes Graeme
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 5 Jul 2017, at 17:17, Frank von Delft 
>>>>>>>>>>>> <frank.vonde...@sgc.ox.ac.uk> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> You keep not answering the challenge.
>>>>>>>>>>>> 
>>>>>>>>>>>> It's really simple:  what information does Rmerge provide that 
>>>>>>>>>>>> Rmeas doesn't.
>>>>>>>>>>>> 
>>>>>>>>>>>> (If you answer, email to the BB.)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 05/07/2017 16:04, graeme.win...@diamond.ac.uk wrote:
>>>>>>>>>>>>> Dear Frank,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> You are forcefully arguing essentially that others are wrong if 
>>>>>>>>>>>>> we feel an existing statistic continues to be useful, and instead 
>>>>>>>>>>>>> insist that it be outlawed so that we may not make use of it, 
>>>>>>>>>>>>> just in case someone misinterprets it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Very well
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I do however express disquiet that we as software developers feel 
>>>>>>>>>>>>> browbeaten to remove the output we find useful because “the 
>>>>>>>>>>>>> community” feel that it is obsolete.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I feel that Jacob’s short story on this thread illustrates that 
>>>>>>>>>>>>> educating the next generation of crystallographers to understand 
>>>>>>>>>>>>> what all of the numbers mean is critical, and that a 
>>>>>>>>>>>>> numerological approach of trying to optimise any one statistic is 
>>>>>>>>>>>>> essentially doomed. Precisely the same argument could be made for 
>>>>>>>>>>>>> people cutting the “resolution” at the wrong place in order to 
>>>>>>>>>>>>> improve the average I/sig(I) of the data set.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Denying access to information is not a solution to 
>>>>>>>>>>>>> misinterpretation, from where I am sat, however I acknowledge 
>>>>>>>>>>>>> that other points of view exist.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best wishes Graeme
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 5 Jul 2017, at 12:11, Frank von Delft 
>>>>>>>>>>>>> <frank.vonde...@sgc.ox.ac.uk<mailto:frank.vonde...@sgc.ox.ac.uk>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Graeme, Andrew
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jacob is not arguing against an R-based statistic;  he's pointing 
>>>>>>>>>>>>> out that leaving out the multiplicity-weighting is prehistoric 
>>>>>>>>>>>>> (Diederichs & Karplus published it 20 years ago!).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So indeed:   Rmerge, Rpim and I/sigI give different information.  
>>>>>>>>>>>>> As you say.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But no:   Rmerge and Rmeas and Rcryst do NOT give different 
>>>>>>>>>>>>> information.  Except:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>     * Rmerge is a (potentially) misleading version of Rmeas.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>     * Rcryst and Rmerge and Rsym are terms that no longer have 
>>>>>>>>>>>>> significance in the single cryo-dataset world.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> phx.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 05/07/2017 09:43, Andrew Leslie wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would like to support Graeme in his wish to retain Rmerge in 
>>>>>>>>>>>>> Table 1, essentially for exactly the same reasons.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also strongly support Francis Reyes comment about the 
>>>>>>>>>>>>> usefulness of Rmerge at low resolution, and I would add to his 
>>>>>>>>>>>>> list that it can also, in some circumstances, be more indicative 
>>>>>>>>>>>>> of the wrong choice of symmetry (too high) than the statistics 
>>>>>>>>>>>>> that come from POINTLESS (excellent though that program is!).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Andrew
>>>>>>>>>>>>> On 5 Jul 2017, at 05:44, Graeme Winter 
>>>>>>>>>>>>> <graeme.win...@gmail.com<mailto:graeme.win...@gmail.com>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> HI Jacob
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, I got this - and I appreciate the benefit of Rmeas for 
>>>>>>>>>>>>> dealing with measuring agreement for small-multiplicity 
>>>>>>>>>>>>> observations. Having this *as well* is very useful and I agree 
>>>>>>>>>>>>> Rmeas / Rpim / CC-half should be the primary “quality” statistics.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, you asked if there is any reason to *keep* rather
>>>>>>>>>>>>> than *eliminate* Rmerge, and I offered one :o)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I do not see what harm there is reporting Rmerge, even if it is 
>>>>>>>>>>>>> just used in the inner shell or just used to capture a flavour of 
>>>>>>>>>>>>> the data set overall. I also appreciate that Rmeas converges to 
>>>>>>>>>>>>> the same value for large multiplicity i.e.:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Overall InnerShell  OuterShell
>>>>>>>>>>>>> Low resolution limit                       39.02 39.02      1.39
>>>>>>>>>>>>> High resolution limit                       1.35 6.04      1.35
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Rmerge  (within I+/I-)                     0.080 0.057     2.871
>>>>>>>>>>>>> Rmerge  (all I+ and I-)                    0.081 0.059     2.922
>>>>>>>>>>>>> Rmeas (within I+/I-)                       0.081 0.058     2.940
>>>>>>>>>>>>> Rmeas (all I+ & I-) 0.082 0.059     2.958
>>>>>>>>>>>>> Rpim (within I+/I-)                        0.013 0.009     0.628
>>>>>>>>>>>>> Rpim (all I+ & I-) 0.009 0.007     0.453
>>>>>>>>>>>>> Rmerge in top intensity bin                0.050 -         -
>>>>>>>>>>>>> Total number of observations             1265512 16212     53490
>>>>>>>>>>>>> Total number unique                        17515 224      1280
>>>>>>>>>>>>> Mean((I)/sd(I))                             29.7 104.3       1.5
>>>>>>>>>>>>> Mn(I) half-set correlation CC(1/2)         1.000 1.000     0.778
>>>>>>>>>>>>> Completeness                               100.0 99.7     100.0
>>>>>>>>>>>>> Multiplicity                                72.3 72.4      41.8
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Anomalous completeness                     100.0 100.0     100.0
>>>>>>>>>>>>> Anomalous multiplicity                      37.2 42.7      21.0
>>>>>>>>>>>>> DelAnom correlation between half-sets      0.497 0.766    -0.026
>>>>>>>>>>>>> Mid-Slope of Anom Normal Probability       1.039 -         -
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (this is a good case for Rpim & CC-half as resolution limit
>>>>>>>>>>>>> criteria)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If the statistics you want to use are there & some others
>>>>>>>>>>>>> also, what is the pressure to remove them? Surely we want to
>>>>>>>>>>>>> educate on how best to interpret the entire table above to
>>>>>>>>>>>>> get a fuller picture of the overall quality of the data? My
>>>>>>>>>>>>> 0th-order request would be to publish the three shells as
>>>>>>>>>>>>> above ;o)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers Graeme
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4 Jul 2017, at 22:09, Keller, Jacob 
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I suggested replacing Rmerge/sym/cryst with Rmeas, not Rpim. 
>>>>>>>>>>>>> Rmeas is simply (Rmerge * sqrt(n/n-1)) where n is the number of 
>>>>>>>>>>>>> measurements of that reflection. It's merely a way of correcting 
>>>>>>>>>>>>> for the multiplicity-related artifact of Rmerge, which is 
>>>>>>>>>>>>> becoming even more of a problem with data sets of increasing 
>>>>>>>>>>>>> variability in multiplicity. Consider the case of comparing a 
>>>>>>>>>>>>> data set with a multiplicity of 2 versus one of 100: equivalent 
>>>>>>>>>>>>> data quality would yield Rmerges diverging by a factor of ~1.4. 
>>>>>>>>>>>>> But this has all been covered before in several papers. It can be 
>>>>>>>>>>>>> and is reported in resolution bins, so can used exactly as you 
>>>>>>>>>>>>> say. So, why not "disappear" Rmerge from the software?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The only reason I could come up with for keeping it is historical 
>>>>>>>>>>>>> reasons or comparisons to previous datasets, but anyway those 
>>>>>>>>>>>>> comparisons would be confounded by variabities in multiplicity 
>>>>>>>>>>>>> and a hundred other things, so come on, developers, just comment 
>>>>>>>>>>>>> it out!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> JPK
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From:
>>>>>>>>>>>>> graeme.win...@diamond.ac.uk<mailto:graeme.win...@diamond.ac.
>>>>>>>>>>>>> uk> [mailto:graeme.win...@diamond.ac.uk]
>>>>>>>>>>>>> Sent: Tuesday, July 04, 2017 4:37 PM
>>>>>>>>>>>>> To: Keller, Jacob
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>>
>>>>>>>>>>>>> Cc: ccp4bb@jiscmail.ac.uk<mailto:ccp4bb@jiscmail.ac.uk>
>>>>>>>>>>>>> Subject: Re: [ccp4bb] Rmergicide Through Programming
>>>>>>>>>>>>> 
>>>>>>>>>>>>> HI Jacob
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Unbiased estimate of the true unmerged I/sig(I) of your data
>>>>>>>>>>>>> (I find this particularly useful at low resolution) i.e. if
>>>>>>>>>>>>> your inner shell Rmerge is 10% your data agree very poorly;
>>>>>>>>>>>>> if 2% says your data agree very well provided you have
>>>>>>>>>>>>> sensible multiplicity… obviously depends on sensible
>>>>>>>>>>>>> interpretation. Rpim hides this (though tells you more about
>>>>>>>>>>>>> the quality of average measurement)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Essentially, for I/sig(I) you can (by and large) adjust your 
>>>>>>>>>>>>> sig(I) values however you like if you were so inclined. You can 
>>>>>>>>>>>>> only adjust Rmerge by excluding measurements.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would therefore defend that - amongst the other stats you
>>>>>>>>>>>>> enumerate below - it still has a place
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers Graeme
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4 Jul 2017, at 14:10, Keller, Jacob 
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Rmerge does contain information which complements the others.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What information? I was trying to think of a counterargument to 
>>>>>>>>>>>>> what I proposed, but could not think of a reason in the world to 
>>>>>>>>>>>>> keep reporting it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> JPK
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4 Jul 2017, at 12:00, Keller, Jacob 
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Dear Crystallographers,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Having been repeatedly chagrinned about the continued use and 
>>>>>>>>>>>>> reporting of Rmerge rather than Rmeas or similar, I thought of a 
>>>>>>>>>>>>> potential way to promote the change: what if merging programs 
>>>>>>>>>>>>> would completely omit Rmerge/cryst/sym? Is there some reason to 
>>>>>>>>>>>>> continue to report these stats, or are they just grandfathered 
>>>>>>>>>>>>> into the software? I doubt that any journal or crystallographer 
>>>>>>>>>>>>> would insist on reporting Rmerge per se. So, I wonder what 
>>>>>>>>>>>>> developers would think about commenting out a few lines of their 
>>>>>>>>>>>>> code, seeing what happens? Maybe a comment to the effect of 
>>>>>>>>>>>>> "Rmerge is now deprecated; use Rmeas" would be useful as well. 
>>>>>>>>>>>>> Would something catastrophic happen?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> All the best,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jacob Keller
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *******************************************
>>>>>>>>>>>>> Jacob Pearson Keller, PhD
>>>>>>>>>>>>> Research Scientist
>>>>>>>>>>>>> HHMI Janelia Research Campus / Looger lab
>>>>>>>>>>>>> Phone: (571)209-4000 x3159
>>>>>>>>>>>>> Email:
>>>>>>>>>>>>> kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><ma
>>>>>>>>>>>>> ilto:kell...@janelia.hhmi.org>
>>>>>>>>>>>>> *******************************************
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> This e-mail and any attachments may contain confidential, 
>>>>>>>>>>>>> copyright and or privileged material, and are for the use of the 
>>>>>>>>>>>>> intended addressee only. If you are not the intended addressee or 
>>>>>>>>>>>>> an authorised recipient of the addressee please notify us of 
>>>>>>>>>>>>> receipt by returning the e-mail and do not use, copy, retain, 
>>>>>>>>>>>>> distribute or disclose the information in or attached to the 
>>>>>>>>>>>>> e-mail.
>>>>>>>>>>>>> Any opinions expressed within this e-mail are those of the 
>>>>>>>>>>>>> individual and not necessarily of Diamond Light Source Ltd.
>>>>>>>>>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or 
>>>>>>>>>>>>> any attachments are free from viruses and we cannot accept 
>>>>>>>>>>>>> liability for any damage which you may sustain as a result of 
>>>>>>>>>>>>> software viruses which may be transmitted in or with the message.
>>>>>>>>>>>>> Diamond Light Source Limited (company no. 4375679).
>>>>>>>>>>>>> Registered in England and Wales with its registered office
>>>>>>>>>>>>> at Diamond House, Harwell Science and Innovation Campus,
>>>>>>>>>>>>> Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
> 
> -- 
> John Berrisford
> PDBe
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD UK
> Tel: +44 1223 492529
> 
> http://www.pdbe.org
> http://www.facebook.com/proteindatabank
> http://twitter.com/PDBeurope

Re: [ccp4bb] AW: [ccp4bb] Rmergicide Through Programming

Reply via email to