Right about the 1000 in that case, but also Rfree with 5% would be statistically poor. I guess one would be stuck in that case.
JPK From: Pavel Afonine [mailto:[email protected]] Sent: Friday, November 21, 2014 11:16 AM To: Keller, Jacob Cc: [email protected] Subject: Re: [ccp4bb] Free Reflections as Percent and not a Number Oh I see, I though the answer follows from that. Fraction is better (or may be fraction with a cap). Hardwiring a number may not always work. For small crystals or small data sets or incomplete datasets say 1000 reflections may mean 50% of the dataset. All the best, Pavel On Fri, Nov 21, 2014 at 8:09 AM, Keller, Jacob <[email protected]<mailto:[email protected]>> wrote: Agree with all of this—but how does it reflect on the original question of whether to use a percent or an absolute number? JPK From: Pavel Afonine [mailto:[email protected]<mailto:[email protected]>] Sent: Friday, November 21, 2014 11:02 AM To: Keller, Jacob Cc: [email protected]<mailto:[email protected]> Subject: Re: [ccp4bb] Free Reflections as Percent and not a Number Hello, choice of the size of free (or test, whatever you like to call them) reflections is important for three different purposes: - estimation of parameters for ML target for refinement; - map calculation (coefficients m&D in 2mFo-DFc or mFo-DFc map are calculated using test reflections); - validation (calculation Rfree). It is important that free reflections are evenly distributed across the whole resolution range, and each sufficiently thin resolution bin contains at least 50 test reflections so that the estimation of ML parameters is robust and reliable. "Sufficiently thin resolution bin" is such that ML parameters can be assumed constants in it. Smaller test sets will result in less stable refinements (refinement outcome will strongly depend on the choice of test set). Larger test sets will damage map quality (unless all reflections are used in map calculation). Size of free set needs to be sufficiently large so that Rfree is statistically meaningful. Nothing new is said above, it's all documented in the literature! Pavel On Thu, Nov 20, 2014 at 2:43 PM, Keller, Jacob <[email protected]<mailto:[email protected]>> wrote: Dear Crystallographers, I thought that for reliable values for Rfree, one needs only to satisfy counting statistics, and therefore using at most a couple thousand reflections should always be sufficient. Almost always, however, some seemingly-arbitrary percentage of reflections is used, say 5%. Is there any rationale for using a percentage rather than some absolute number like 1000? All the best, Jacob
