Re: [ccp4bb] How many reflections for Rfree?

Randy Read Tue, 17 Jun 2008 07:31:09 -0700

Hi,

Apart from the issue of what to tell referees, there are two slightlydifferent practical issues. One question that has been addressed ishow precise the estimate of Rfree will be for a certain number of testset reflections, and this has been discussed in papers by Axel and byIan Tickle.

But there's another important issue: how many reflections do you needto get a good estimate of the sigmaA values (as a function ofresolution) needed to calibrate the likelihood target? I know I'vediscussed this in talks, but it doesn't look like I ever publishedanything about it. Perhaps someone else has.

There isn't a hard and fast answer, but there are probably reasonablerules of thumb. My impression is that you gain relatively little byadding more reflections, once you have a total of about 1000 or atmost 2000 in the cross-validation set. However, giving up more than10% of the data is probably a bad idea, even if the sigmaA estimatesare somewhat less accurate. I've had reasonable results refiningagainst data sets of 3000-5000 reflections, setting aside only 10%(i.e. 300-500 reflections) for cross-validation.


So here's the recipe I would use, for what it's worth:
   <10000 reflections:    set aside 10%
   10000-20000 reflections:  set aside 1000 reflections
   20000-40000 reflections:  set aside 5%
   >40000 reflections: set aside 2000 reflections

I'm sure that with a bit of thought someone could come up with asmooth function that achieves something similar, but it seems adequate.

In case anyone is interested, the reason this is a bit simplistic isthat the number of reflections you need depends on how good your modelis. If you look at the contribution to the likelihood function fromone reflection, it is very broad for low sigmaA values and becomessharper as the sigmaA values increase. This means that, if the truevalue of sigmaA is low, you need more reflections to get a preciseestimate than if the true sigmaA value is high. This happens because,if sigmaA is low, any value of Fo could be expected for a particularFc because the model predicts the data poorly, but if sigmaA is high,then there is a very restricted range of possible values for Fo givenFc. So to get stable refinement from a very poor model, you mightneed to set aside a larger number of reflections for cross-validatedsigmaA estimation. Later on, when the model is better, you couldafford to absorb some of those reflections into the working set.

Now, what are the chances that such a procedure would pass through therefereeing system without raising any eyebrows?


Randy Read

On 17 Jun 2008, at 13:50, Anastassis Perrakis wrote:

Hi-

I am afraid that the real issue might be that the real question is:
'How do I tell the referee of my paper that 1500 reflections areenough?'
Something along the lines of a statement like:
"It is generally accepted by the X-ray crystallography communitythat 1000-1500 are far enough for estimating Rfree, and in fact asfew as 150 reflections can be enough. for a reference the editorcould check the ccp4bb archive or the crystallography wikis andreferences therein"
Now ... thats fun ...:

The IUcr Wiki says:
"A fixed percentage of the total number of reflections is usuallyassigned to the free group."
And it references only the MiE paper of Axel in 1997 where I read:

In all test calculations to date, the free R value has
been highly correlated with the phase accuracy of the atomic model. In
practice, about 5-10% of the observed diffraction data (chosen atrandomfrom the unique reflections) become sequestered in the test set. Thesizeof the test set is a compromise between the desire to minimizestatisticalfluctuations of the free R value and the need to avoid a deleteriouseffect
on the atomic model by omission of too much experimental data.

I could not find the full text of the 1992 paper.


CCP4 [User Community] Wiki
The 'user' one has links to the IUcr Wiki and the "test set" articleis still not written.
The 'official' one does not touch the subject either.

Time to write it (with appropriate references!!!) ?

Tassos



On Jun 16, 2008, at 20:21, Mark J. van Raaij wrote:
Dear All,
It has recently been put to me that 5-10% of reflections shouldalways be set aside for calculation of Rfree. However, when onehas, say, 200 000 reflections (high resolution and/or largeasymmetric unit), that seems to me to be a waste, because it wouldmean removing 10 000 to 20 000 reflections from the refinementtarget.
My opinion is based on discussions with a particular knowledgeablecrystallographer and a meeting report ( http://journals.iucr.org/d/issues/1996/01/00/li0216/li0216.pdf), in which 1000 reflections is said to be enough (i.e.statistically valid). As a result, I always take 1000-1500reflections, independently of the number of total reflections,adjusting the percentage or the width of the thin shells (in caseof NCS) accordingly.
What is the opinion of the community? And, if you agree with me,how should we try to get this opinion (more) generally accepted?
On a related note, how to refine a structure with only 5000reflections, which could happen when you have a small a.u. andmodest resolution? Could, exceptionally, a lower absolute amount ofreflections be used for Rfree, say 500?
Greetings,

Mark


Mark J. van Raaij
Dpto de Bioquímica, Facultad de Farmacia
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/



------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research      Tel: + 44 1223 336500
Wellcome Trust/MRC Building                   Fax: + 44 1223 336827
Hills Road                                    E-mail: [EMAIL PROTECTED]

Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk

Re: [ccp4bb] How many reflections for Rfree?

Reply via email to