Another issue with these statistics is that the PDB insists on a single value of "resolution" no matter how anisotropic the data. Especially in the outermost bins, Rmerge could be ridiculously high simply because the data only exist in one out of 3 directions. Phoebe
===================================== Phoebe A. Rice Dept. of Biochemistry & Molecular Biology The University of Chicago phone 773 834 1723 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.rsc.org/shop/books/2008/9780854042722.asp ---- Original message ---- >Date: Tue, 26 Oct 2010 09:46:46 -0700 >From: CCP4 bulletin board <[email protected]> (on behalf of "Bernhard Rupp >(Hofkristallrat a.D.)" <[email protected]>) >Subject: [ccp4bb] Against Method (R) >To: [email protected] > >Hi Folks, > >Please allow me a few biased reflections/opinions on the numeRology of the >R-value (not R-factor, because it is neither a factor itself nor does it >factor in anything but ill-posed reviewer's critique. Historically the term >originated from small molecule crystallography, but it is only a >'Residual-value') > >a) The R-value itself - based on the linear residuals and of apparent >intuitive meaning - is statistically peculiar to say the least. I could not >find it in any common statistics text. So doing proper statistics with R >becomes difficult. > >b) rules of thumb (as much as they conveniently obviate the need for >detailed explanations, satisfy student's desire for quick answers, and >allow superficial review of manuscripts) become less valuable if they have a >case-dependent large variance, topped with an unknown parent distribution. >Combined with an odd statistic, that has great potential for misguidance and >unnecessarily lost sleep. > >c) Ian has (once again) explained that for example the Rf-R depends on the >exact knowledge of the restraints and their individual weighting, which we >generally do not have. Caution is advised. > >d) The answer which model is better - which is actually what you want to >know - becomes a question of model selection or hypothesis testing, which, >given the obscurity of R cannot be derived with some nice plug-in method. As >Ian said the models to be compared must also be based on the same and >identical data. > >e) One measure available that is statistically at least defensible is the >log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes >factor (there is the darn factor again, it’s a ratio)) and see where this >falls - and the answers are pretty soft and, probably because of that, >correspondingly realistic. This also makes - based on statistics alone - >deciding between different overall parameterizations difficult. > >http://en.wikipedia.org/wiki/Bayes_factor > >f) so having said that, what really remains is that the model that fits the >primary evidence (minimally biased electron density) best and is at the same >time physically meaningful, is the best model, i. e., all plausibly >accountable electron density (and not more) is modeled. You can convince >yourself of this by taking the most interesting part of the model out (say a >ligand or a binding pocket) and look at the R-values or do a model selection >test - the result will be indecisive. Poof goes the global rule of thumb. > >g) in other words: global measures in general are entirely inadequate to >judge local model quality (noted many times over already by Jones, Kleywegt, >others, in the dark ages of crystallography when poorly restrained >crystallographers used to passionately whack each other over the head with >unfree R-values). > >Best, BR >----------------------------------------------------------------- >Bernhard Rupp, Hofkristallrat a.D. >001 (925) 209-7429 >+43 (676) 571-0536 >[email protected] >[email protected] >http://www.ruppweb.org/ >------------------------------------------------------------------ >Und wieder ein chillout-mix aus der Hofkristall-lounge >------------------------------------------------------------------
