Using a percentage might be justified as a trade-off between having ample free
reflections for statistics and cutting too deeply into your completeness of
working reflections for refinement.
It seems to be generally agreed that 2000 free reflections is sufficient to
guide your choice of refinement strategy. You don't _need_ more. I think it is
also agreed that 95% completeness of data is sufficient to solve a structure
with high quality. You don't _need_ more to solve it.
But now if you want to look at R and R-free in 40 resolution bins - that's only 50
free reflections in each bin - you are likely to have unexpected things like Rfree
< Rwork in some bins, just by statistics. More would be better.But if we have a
lot of residues just on the Ramachandran border between allowed and generously
allowed, we might suspect that more working reflections would help to pull them
inside. There is a trade-off.
If your entire dataset is only 20,000 reflections, you are already giving up
10% of your data for refinement, you make do with 2000 or even 1000 free
reflections and use fewer bins. If you have 400,000 reflections you wouldn't
think twice about using 4, or 5,000 reflections to get better statistics in the
bins. You probably wouldn't want to use 5% in that case, but if you did it
might not greatly hurt your structure. So i guess we need some guideline, as we
get more and more reflections, what is the best way to allocate between free
and working? probably something between a fixed absolute minimum for number of
free reflections and a constant percentage of reflections; perhaps with a cap
above which it really wouldn't help to have more free reflections than that. I
believe one of the non-CCP4 crystallography packages does have a cap, using the
lesser of a fixed percentage and a fixed constant number. Yes, the default in a
recent .eff file is:
fraction = 0.1
max_free = 2000
Of course, having a huge number of reflections doesn't necessarily mean an
over-determined structure, we could be talking about a ribosome at 4A or
something, tough building, and you could be loathe to give up more data than
you have to..
eab
On 11/20/2014 05:43 PM, Keller, Jacob wrote:
Dear Crystallographers,
I thought that for reliable values for Rfree, one needs only to satisfy
counting statistics, and therefore using at most a couple thousand reflections
should always be sufficient. Almost always, however, some seemingly-arbitrary
percentage of reflections is used, say 5%. Is there any rationale for using a
percentage rather than some absolute number like 1000?
All the best,
Jacob
*******************************************
Jacob Pearson Keller, PhD
Looger Lab/HHMI Janelia Research Campus
19700 Helix Dr, Ashburn, VA 20147
email: [email protected]
*******************************************