Dear Zbyszek,

     That is a useful point. Another way of making it is to notice that the
correlation coefficient between two random variables is the cosine of the
angle between two vectors of paired values for these, with the proviso that
the sums of the component values for each vector add up to zero. The fact
that an angle is involved means that the CC is independent of scale, while
the fact that it is the cosine of that angle makes it rather insensitive to
small-ish angles: a cosine remains close to 1.0 for quite a range of angles.

     This is presumably the "nature of correlation coefficients" you were
referring to.


     With best wishes,
     
          Gerard.

--
On Fri, Dec 07, 2012 at 11:14:50AM -0600, Zbyszek Otwinowski wrote:
> The difference between one and the correlation coefficient is a square
> function of differences between the datapoints. So rather large 6%
> relative error with 8-fold data multiplicity (redundancy) can lead to
> CC1/2 values about 99.9%.
> It is just the nature of correlation coefficients.
> 
> Zbyszek Otwinowski
> 
> 
> 
> > Related to this, I've always wondered what CC1/2 values mean for low
> > resolution. Not being mathematically inclined, I'm sure this is a naive
> > question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean?
> > Does it mean the data is as good as it gets?
> >
> > Alan
> >
> >
> >
> > On 07/12/2012 17:15, Douglas Theobald wrote:
> >> Hi Boaz,
> >>
> >> I read the K&K paper as primarily a justification for including
> >> extremely weak data in refinement (and of course introducing a new
> >> single statistic that can judge data *and* model quality comparably).
> >> Using CC1/2 to gauge resolution seems like a good option, but I never
> >> got from the paper exactly how to do that.  The resolution bin where
> >> CC1/2=0.5 seems natural, but in my (limited) experience that gives
> >> almost the same answer as I/sigI=2 (see also K&K fig 3).
> >>
> >>
> >>
> >> On Dec 7, 2012, at 6:21 AM, Boaz Shaanan <bshaa...@exchange.bgu.ac.il>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm sure Kay will have something to say  about this but I think the
> >>> idea of the K & K paper was to introduce new (more objective) standards
> >>> for deciding on the resolution, so I don't see why another table is
> >>> needed.
> >>>
> >>> Cheers,
> >>>
> >>>
> >>>
> >>>
> >>>            Boaz
> >>>
> >>>
> >>> Boaz Shaanan, Ph.D.
> >>> Dept. of Life Sciences
> >>> Ben-Gurion University of the Negev
> >>> Beer-Sheva 84105
> >>> Israel
> >>>
> >>> E-mail: bshaa...@bgu.ac.il
> >>> Phone: 972-8-647-2220  Skype: boaz.shaanan
> >>> Fax:   972-8-647-2992 or 972-8-646-1710
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ________________________________________
> >>> From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas
> >>> Theobald [dtheob...@brandeis.edu]
> >>> Sent: Friday, December 07, 2012 1:05 AM
> >>> To: CCP4BB@JISCMAIL.AC.UK
> >>> Subject: [ccp4bb] refining against weak data and Table I stats
> >>>
> >>> Hello all,
> >>>
> >>> I've followed with interest the discussions here about how we should be
> >>> refining against weak data, e.g. data with I/sigI << 2 (perhaps using
> >>> all bins that have a "significant" CC1/2 per Karplus and Diederichs
> >>> 2012).  This all makes statistical sense to me, but now I am wondering
> >>> how I should report data and model stats in Table I.
> >>>
> >>> Here's what I've come up with: report two Table I's.  For comparability
> >>> to legacy structure stats, report a "classic" Table I, where I call the
> >>> resolution whatever bin I/sigI=2.  Use that as my "high res" bin, with
> >>> high res bin stats reported in parentheses after global stats.   Then
> >>> have another Table (maybe Table I* in supplementary material?) where I
> >>> report stats for the whole dataset, including the weak data I used in
> >>> refinement.  In both tables report CC1/2 and Rmeas.
> >>>
> >>> This way, I don't redefine the (mostly) conventional usage of
> >>> "resolution", my Table I can be compared to precedent, I report stats
> >>> for all the data and for the model against all data, and I take
> >>> advantage of the information in the weak data during refinement.
> >>>
> >>> Thoughts?
> >>>
> >>> Douglas
> >>>
> >>>
> >>> ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
> >>> Douglas L. Theobald
> >>> Assistant Professor
> >>> Department of Biochemistry
> >>> Brandeis University
> >>> Waltham, MA  02454-9110
> >>>
> >>> dtheob...@brandeis.edu
> >>> http://theobald.brandeis.edu/
> >>>
> >>>             ^\
> >>>   /`  /^.  / /\
> >>> / / /`/  / . /`
> >>> / /  '   '
> >>> '
> >>>
> >>
> >>
> >
> > --
> > Alan Cheung
> > Gene Center
> > Ludwig-Maximilians-University
> > Feodor-Lynen-Str. 25
> > 81377 Munich
> > Germany
> > Phone:  +49-89-2180-76845
> > Fax:  +49-89-2180-76999
> > E-mail: che...@lmb.uni-muenchen.de
> >

-- 

     ===============================================================
     *                                                             *
     * Gerard Bricogne                     g...@globalphasing.com  *
     *                                                             *
     * Global Phasing Ltd.                                         *
     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
     *                                                             *
     ===============================================================

Reply via email to