Re: [ccp4bb] Rfree reflections

Robbie Joosten Tue, 26 Mar 2013 02:24:59 -0700

Hi Tim,

I don't think the 5-10% or 500-1000 reflections are real rules, but rather
practical choices. The error margin in R-free is inverse proportional with
the number of reflections in your test set and also proportional with R-free
itself. So for R-free to be 'significant' you need some absolute number of
reflections to reach your cut-off of significance. This is where the 1000
comes from (500 is really pushing the limit). 
You want to make sure the error margin in R and R-free are not too far apart
and you probably also want to keep the test set representative of the whole
data set (this is particularly important because we use hold-out validation,
you only get one shot at validating). This is where the 5%-10% comes from.  
Another consideration for going for the 5%-10% thing is that this makes it
feasible to do 'full' (i.e. k-fold) cross-validation: you only have to do
20-10 refinements.  If you would go for 1000 reflections you would have to
do 48 refinements for the average dataset.


Personally, I take 5% and increase this percentage to maximum 10% if using
5% gives me a test set smaller than 1000 reflections.

HTH,
Robbie

> -----Original Message-----
> From: CCP4 bulletin board [mailto:[email protected]] On Behalf Of
> Tim Gruene
> Sent: Tuesday, March 26, 2013 09:33
> To: [email protected]
> Subject: [ccp4bb] Rfree reflections
> 
> Dear all,
> 
> I recall that the set of Rfree reflections should be 500-1000, rather than
5-
> 10%, but I cannot find the reference for it (maybe Ian Tickle?).
> 
> I would therefore like to be confirmed or corrected:
> 
> Is there an absolute number required for Rfree to be significant, i.e.
500-1000
> irrespective of the total number of unique reflections in the data set, or
is it
> 5-10% (as a compromise)?
> 
> Thanks and regards,
> Tim
> 
> --
> --
> Dr Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
> 
> GPG Key ID = A46BEE1A

Re: [ccp4bb] Rfree reflections

Reply via email to