Re: [ccp4bb] Rfree in similar data set

Ian Tickle Thu, 24 Sep 2009 07:26:08 -0700

Hi Dirk

But this is not the bias that I think most people are thinking of
(including I would say Mike who asked the question originally), i.e.
where the 2nd dataset does *not* have the same test set as the 1st
dataset, so that some indices from the 1st working set will be in the
2nd test set, and vice versa.  It's surely the presumed Rfree bias (i.e.
presumed to be lower than the true value) *after* re-refining against
the 2nd dataset that Mike is concerned about.  In any case as I said
before it's not clear what use is the Rfree against the 2nd dataset
*before* doing any refinement against that dataset, it's surely the
Rfree *after* refinement (i.e. at convergence) that's relevant.


Cheers

-- Ian

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
On
> Behalf Of Dirk Kostrewa
> Sent: 24 September 2009 13:21
> To: CCP4BB
> Subject: Re: [ccp4bb] Rfree in similar data set
> 
> Hi Ian,
> 
> consider the case where two data sets have been collected from the
same
> crystal (or a crystal from the same drop), each processed separately,
and
> the structure refined against one of the two data sets until
convergence.
> The two data sets will be somewhat different due to measurement errors
but
> still very similar. Thus, when I take the refined structure and
re-refine
> it against the second data set using the same indices for working and
test
> set (and the same refinement parameters), both the starting R and
Rfree
> will not have converged against the second data set, but will be
similar
> to the refined values from the first data set. The differences will be
> mainly caused by the measurement errors. It is this type of bias of
the
> test set that (at least) I mean. After convergence of refinement
against
> the second data set, both R and Rfree will be then very similar for
the
> two data sets.
> 
> Best regards,
> 
> Dirk.
> 
> Am 24.09.2009 um 11:56 schrieb Ian Tickle:
> 
> 
>       Hi, I beg to disagree with the 'perceived wisdom', including
just
> about
>       everyone on this BB, but my answer is NO, there should be no
bias -
>       *provided* you do the subsequent refinement properly.  First
off,
> Rfree
>       is useless as any kind of statistical measure of overfitting etc
>       *unless* the refinement has converged to the point of maximum
log
>       likelihood against the current working set.  So it's meaningless
to
> say
>       that Rfree is biased 'initially' i.e. *before* any further
> refinement is
>       done using the new data because Rfree with the new data has no
> meaning
>       at that point - it's neither biased nor unbiased, it's just
> meaningless!
>       In any case why would one want to report an Rfree *before*
> refinement -
>       what use is it?
> 
>       So we can only sensibly talk about the Rfree values *after* the
> further
>       refinement has converged - and if the refinement hasn't
converged
> then
>       Rfree bias is the least of your worries!  So are people really
> saying
>       that the Rfree at convergence using the new data is biased?  For
> that to
>       be true it would have to be possible to arrive at a different
> unbiased
>       Rfree from another starting point.  But provided your starting
point
>       wasn't a local maximum LL and you haven't gotten into a local
> maximum
>       along the way, convergence will be to a unique global maximum of
the
> LL,
>       so the Rfree must be the same whatever starting point is used
> (within
>       the radius of convergence of course).
> 
>       The other cures suggested such as SA and randomisation are IMO
at
> best a
>       waste of time and effort (i.e. it will take longer for
subsequent
>       refinement to recover from the shock to the system), and at
worst
> likely
>       to be worse than the disease they purport to cure.  For example
how
> do
>       you know what RMS shift to use in the randomisation without
causing
> the
>       structure to jump into a local maximum LL: the resulting Rfree
will
>       certainly be biased then!
> 
>       There is of course a different issue (and maybe this is what is
>       confusing some people) of comparing Rfree's from different test
> sets: we
>       showed that this introduces a random relative error in Rfree of
>       1/sqrt(2*Nfree) (where Nfree = size of test set).  However this
> effect
>       is not bias, it's random sampling error.
> 
>       Cheers
> 
>       -- Ian
> 
> 
> 
>               -----Original Message-----
> 
> 
>               From: [email protected] [mailto:owner-
> [email protected]]
> 
> 
>       On
> 
> 
>               Behalf Of Mike England
> 
> 
>               Sent: 24 September 2009 04:31
> 
> 
>               To: [email protected]
> 
> 
>               Subject: Rfree in similar data set
> 
> 
> 
>               Hi all,
> 
> 
> 
>               I will appreciate your comments on the following case:
> 
> 
> 
>               I have two datasets from the same or identical crystals.
> Initially, I
> 
> 
>               refine a structure against the first data set  and later
on
> switch to
> 
> 
>               another dataset  for further refinements.
> 
> 
>               Do you think, my Rfree will be biased as Rfree
reflections in
> second
> 
> 
>               dataset may be in fact Rwork reflections in previous
datasets
> ?
> 
> 
> 
>               Thanks in advance,
> 
> 
> 
>               Mike
> 
> 
> 
> 
> 
>       Disclaimer
>       This communication is confidential and may contain privileged
> information intended solely for the named addressee(s). It may not be
used
> or disclosed except for the purpose for which it has been sent. If you
are
> not the intended recipient you must not review, use, disclose, copy,
> distribute or take any action in reliance upon it. If you have
received
> this communication in error, please notify Astex Therapeutics Ltd by
> emailing [email protected] and destroy all copies of the
> message and any attached documents.
>       Astex Therapeutics Ltd monitors, controls and protects all its
> messaging traffic in compliance with its corporate email policy. The
> Company accepts no liability or responsibility for any onward
transmission
> or use of emails and attachments having left the Astex Therapeutics
> domain.  Unless expressly stated, opinions in this message are those
of
> the individual sender and not of Astex Therapeutics Ltd. The recipient
> should check this email and any attachments for the presence of
computer
> viruses. Astex Therapeutics Ltd accepts no liability for damage caused
by
> any virus transmitted by this email. E-mail is susceptible to data
> corruption, interception, unauthorized amendment, and tampering, Astex
> Therapeutics Ltd only send and receive e-mails on the basis that the
> Company is not liable for any such alteration or any consequences
thereof.
>       Astex Therapeutics Ltd., Registered in England at 436 Cambridge
> Science Park, Cambridge CB4 0QA under number 3751674
> 
> 
> 
> 
> *******************************************************
> Dirk Kostrewa
> Gene Center, A 5.07
> Ludwig-Maximilians-University
> Feodor-Lynen-Str. 25
> 81377 Munich
> Germany
> Phone:  +49-89-2180-76845
> Fax:  +49-89-2180-76999
> E-mail: [email protected]
<mailto:[email protected]>
> WWW: www.genzentrum.lmu.de <mailto:[email protected]>
> *******************************************************
> 



Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing 
[email protected] and destroy all copies of the message and any 
attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674

Re: [ccp4bb] Rfree in similar data set

Reply via email to