Yes, that's true. In such corner cases phenix.refine switches to LS, for example. Alternatively you can generate many (say 100) different tests sets and run 100 refinements. This will give you an ensemble of slightly different structures with slightly different Rwork and Rfree. You can then derive an uncertainty from this due to the choice of test set.
Paper by Praznikar & Turk in upcoming Acta D will provide a methodological solution to this. Yet, it may be a long way before it appears as routine in refinement programs. Pavel On Fri, Nov 21, 2014 at 8:22 AM, Keller, Jacob <[email protected]> wrote: > Right about the 1000 in that case, but also Rfree with 5% would be > statistically poor. I guess one would be stuck in that case. > > > > JPK > > > > *From:* Pavel Afonine [mailto:[email protected]] > *Sent:* Friday, November 21, 2014 11:16 AM > > *To:* Keller, Jacob > *Cc:* [email protected] > *Subject:* Re: [ccp4bb] Free Reflections as Percent and not a Number > > > > Oh I see, I though the answer follows from that. Fraction is better (or > may be fraction with a cap). Hardwiring a number may not always work. For > small crystals or small data sets or incomplete datasets say 1000 > reflections may mean 50% of the dataset. > > > > All the best, > > Pavel > > > > On Fri, Nov 21, 2014 at 8:09 AM, Keller, Jacob <[email protected]> > wrote: > > Agree with all of this—but how does it reflect on the original question of > whether to use a percent or an absolute number? > > > > JPK > > > > *From:* Pavel Afonine [mailto:[email protected]] > *Sent:* Friday, November 21, 2014 11:02 AM > *To:* Keller, Jacob > *Cc:* [email protected] > *Subject:* Re: [ccp4bb] Free Reflections as Percent and not a Number > > > > Hello, > > > > choice of the size of free (or test, whatever you like to call them) > reflections is important for three different purposes: > > > > - estimation of parameters for ML target for refinement; > > - map calculation (coefficients m&D in 2mFo-DFc or mFo-DFc map are > calculated using test reflections); > > - validation (calculation Rfree). > > > > It is important that free reflections are evenly distributed across the > whole resolution range, and each sufficiently thin resolution bin contains > at least 50 test reflections so that the estimation of ML parameters is > robust and reliable. "Sufficiently thin resolution bin" is such that ML > parameters can be assumed constants in it. > > > > Smaller test sets will result in less stable refinements (refinement > outcome will strongly depend on the choice of test set). > > > > Larger test sets will damage map quality (unless all reflections are used > in map calculation). > > > > Size of free set needs to be sufficiently large so that Rfree is > statistically meaningful. > > > > Nothing new is said above, it's all documented in the literature! > > > > Pavel > > > > > > On Thu, Nov 20, 2014 at 2:43 PM, Keller, Jacob <[email protected]> > wrote: > > Dear Crystallographers, > > I thought that for reliable values for Rfree, one needs only to satisfy > counting statistics, and therefore using at most a couple thousand > reflections should always be sufficient. Almost always, however, some > seemingly-arbitrary percentage of reflections is used, say 5%. Is there any > rationale for using a percentage rather than some absolute number like 1000? > > All the best, > > Jacob > > >
