Hi Tim, I don't think the 5-10% or 500-1000 reflections are real rules, but rather practical choices. The error margin in R-free is inverse proportional with the number of reflections in your test set and also proportional with R-free itself. So for R-free to be 'significant' you need some absolute number of reflections to reach your cut-off of significance. This is where the 1000 comes from (500 is really pushing the limit). You want to make sure the error margin in R and R-free are not too far apart and you probably also want to keep the test set representative of the whole data set (this is particularly important because we use hold-out validation, you only get one shot at validating). This is where the 5%-10% comes from. Another consideration for going for the 5%-10% thing is that this makes it feasible to do 'full' (i.e. k-fold) cross-validation: you only have to do 20-10 refinements. If you would go for 1000 reflections you would have to do 48 refinements for the average dataset.
Personally, I take 5% and increase this percentage to maximum 10% if using 5% gives me a test set smaller than 1000 reflections. HTH, Robbie > -----Original Message----- > From: CCP4 bulletin board [mailto:[email protected]] On Behalf Of > Tim Gruene > Sent: Tuesday, March 26, 2013 09:33 > To: [email protected] > Subject: [ccp4bb] Rfree reflections > > Dear all, > > I recall that the set of Rfree reflections should be 500-1000, rather than 5- > 10%, but I cannot find the reference for it (maybe Ian Tickle?). > > I would therefore like to be confirmed or corrected: > > Is there an absolute number required for Rfree to be significant, i.e. 500-1000 > irrespective of the total number of unique reflections in the data set, or is it > 5-10% (as a compromise)? > > Thanks and regards, > Tim > > -- > -- > Dr Tim Gruene > Institut fuer anorganische Chemie > Tammannstr. 4 > D-37077 Goettingen > > GPG Key ID = A46BEE1A
