Hi Dirk

I think cross-validation changed our ideas!  Pre-Rfree the statistics
of refinement was concerned with the 'number of degrees of freedom':

               Ndof  = Nobs - N'par

since this is the expectation of the properly weighted least-squares
residual (chi-square or <Dinc> in the paper).  If we substitute the
effective no of parameters N'par:

               N'par = Npar - Ncon

where Ncon is the actual number of constraints plus the effective
number of restraints, we get:

               Ndof  = Nobs - (Npar - Ncon)
                        = Nobs + Ncon - Npar

Thus if you are unconcerned with overfitting it _appears_ that
constraints/restraints have the effect of increasing Nobs.  However in
fact what's happening is that Ncon reduces Npar.

Post-Rfree things look different because now the expected free
residual (<Dfree> in the paper) is proportional to:

               <Dfree> ~ Nobs + (Npar - Ncon)

Now it's still true that Ncon reduces Npar but it's no longer true
that Ncon increases Nobs.

Cheers

-- Ian

On Mon, Jan 31, 2011 at 11:18 AM, Dirk Kostrewa
<[email protected]> wrote:
> Dear Ian & other CCP4ers,
>
> I want to get a riddle about counting geometrical restraints solved, which
> emerged in my head after a recent discussion on this board about the effect
> of NCS on the data:parameter ratio. This discussion quickly centered around
> the 1998 Acta Cryst paper about R-factor ratios [1]. So, here is my riddle:
>
> On one hand, geometrical restraints can be counted as observations.
> Refinement programs use differences between model geometry and ideal
> geometry restraints as least-squares targets, in a similar way to
> differences between model structure factor amplitudes and observed structure
> factor amplitudes. Model refinement is possible using geometrical restraints
> only, in the complete absence of observed structure factor amplitudes
> (idealization; whether this makes sense, is a different question).
> Geometrical restraints are also counted as observations in [1], both in
> Table 1 and in the text (for example in formula 2).
>
> On the other hand, it is shown in that paper, summarized in Table 2, that
> for the Rfree/Rwork ratios, geometrical restraints effectively reduce the
> number of refinement parameters, with a smooth transition from restraints to
> constraints via the residual term Drest. This implies that geometrical
> restraints can be counted as reducing the numbers of parameters, not as
> increasing the number of observations, which was also brought up as an
> argument in the aforementioned discussion.
>
> Thus, on one hand, geometrical restraints can be counted as observations, on
> the other hand they can be counted as reducing the number of parameters. The
> riddle for me is, that these two ways of counting are mutually exclusive
> alternatives - so, which one is the right one?
>
> I would be grateful, if you, Ian, or any other crystallographer on this
> board could help me (and maybe others) to solve this riddle.
>
> Best regards,
>
> Dirk.
>
> [1] Tickle, Laskowski, Moss. "Rfree and the rfree ratio. I. Derivation of
> expected values of cross-validation residuals used in macromolecular
> least-squares refinement", Acta Cryst., D54, 547-557 (1998)
>
> --
>
> *******************************************************
> Dirk Kostrewa
> Gene Center Munich, A5.07
> Department of Biochemistry
> Ludwig-Maximilians-Universität München
> Feodor-Lynen-Str. 25
> D-81377 Munich
> Germany
> Phone:        +49-89-2180-76845
> Fax:  +49-89-2180-76999
> E-mail:       [email protected]
> WWW:  www.genzentrum.lmu.de
> *******************************************************
>

Reply via email to