Re: [ccp4bb] Effect of NCS on estimate of data:parameter ratio

Ian Tickle Mon, 20 Sep 2010 04:23:04 -0700

Hi Dirk

First, constraints are just a special case of restraints in the limit
of infinite weights, in fact one way of getting constraints is simply
to use restraints with very large weights (though not too large that
you get rounding problems). These 'pseudo-constraints' will be
indistinguishable in effect from the 'real thing'.  So why treat
restraints and constraints differently as far as the statistics are
concerned: the difference is purely one of implementation.

Second, restraints are not interchangeable 1-for-1 with X-ray data as
far as the statistics are concerned: N restraints cannot be considered
as equivalent to N X-ray data, which would be the implication of
adding together the number of restraints and the number of X-ray data.

This can be seen in the estimation of the expected values of the
residuals (chi-squared) for the working & test sets, which are used to
estimate the expected Rfree.  If you take a look at our 1998 AC paper
(D54, 547-557), Table 2 (p.551), the last row of the table (labelled
'RGfree/RG') shows the expected residuals for the working set
(denominator) and test set (numerator) for the cases of no restraints,
restrained and constrained refinement:

No restraints (or constraints):

<Dwork> = f - m
<Dfree>  = f + m

Restrained:

<Dwork> = f - (m - r + Drest)
<Dfree>  = f + (m - r + Drest)

Constrained:

<Dwork> = f - (m - r)
<Dfree>  = f + (m - r)

where:

<Dwork> = expected working set residual (chi-squared),
<Dfree> = expected test set residual (chi-squared),
f   = no of reflections in working set,
m = no of parameters,
r   = no of restraints and/or constraints,
Drest = restraint residual (chi-squared).

The constrained case is obviously just a special case of the
restrained case with Drest = 0, i.e. in the constrained case the
difference between the refined and target values is zero, and the 'no
restraints' case is a special case of this with r = 0.  We can
generalise all of this by writing simply:

<Dwork> = f - m'
<Dfree>  = f + m'

where m' is the effective no of parameters corrected for restraints
and/or constraints (m' = m - r + Drest); the effective no of
parameters is reduced whether you're using restraints or constraints.
In the case where you had both restraints and constraints r would be
the total no of restraints + constraints, however constraints
contribute nothing to Drest.  The 'effectiveness' of a restraint
depends on its contribution to Drest (Z^2), a smaller value means it's
more effective.  A contribution of Z^2 = 1 to Drest completely cancels
the effect of increasing r by 1 by adding the restraint (i.e. the
restraint has no effect).

This incidentally shows that the effect of over-fitting (adding
redundant effective parameters) is to reduce the working set and
increase the test set residuals.  If you consider the working set
residual in the general case:

<Dwork> = f - (m - r + Drest) = f + r - m - Drest

it certainly appears from this that the number of X-ray data (f) and
the number of restraints (r) are being added.

However if you consider the test set residual:

<Dfree>  = f + (m - r + Drest) = f - r + m + Drest

this is clearly not the case.  All you can say is that the effective
number of parameters is reduced by the number of restraints +
constraints.

Cheers

-- Ian

On Mon, Sep 20, 2010 at 9:20 AM, Dirk Kostrewa
<[email protected]> wrote:
>  Hi Ian,
>
> Am 19.09.10 15:25, schrieb Ian Tickle:
>>
>> Hi Florian,
>>
>> Tight NCS restraints or NCS constraints (they are essentially the same
>> thing in effect if not in implementation) both reduce the effective
>> parameter count on a 1-for-1 basis.
>>
>> Restraints should not be considered as being added to the pool of
>> X-ray observations in the calculation of the obs/param ratio, simply
>> because restraints and X-ray observations can in no way be regarded as
>> interchangeable (increasing the no of restraints by N is not
>> equivalent to increasing the no of reflections by N).  This becomes
>> apparent when you try to compute the expected Rfree: the effective
>> contribution of the restraints has to be subtracted from the parameter
>> count, not added to the observation count.
>
> I always understood the difference between constraints and restraints such,
> that a constraint reduces the number of parameters by fixing certain
> parameters, whereas restraints are target values for parameters and as such
> can be counted as observations, similarly to the Fobs, which are target
> values for the Fcalc (although with different weights). I don't see what is
> wrong with this view. Do I misunderstand something?
>
> Best regards,
>
> Dirk.
>
>> The complication is that a 'weak' restraint is equivalent to less than
>> 1 parameter (I call it the 'effective no of restraints': it can be
>> calculated from the chi-squared for the restraint).  Obviously no
>> restraint is equivalent no parameter, so you can think of it as a
>> continuous sliding scale from no restraint (effective contribution to
>> be subtracted from parameter count = 0) through weak restraint (0<
>> contribution<  1) through tight restraint (count ~=1) to constraint
>> (count = 1).
>>
>> Cheers
>>
>> -- Ian
>>
>> On Sat, Sep 18, 2010 at 9:23 PM, Florian Schmitzberger
>> <[email protected]>  wrote:
>>>
>>> Dear All,
>>>
>>> I would have a question regarding the effect of non-crystallographic
>>> symmetry (NCS) on the data:parameter ratio in refinement.
>>>
>>> I am working with X-ray data to a maximum resolution of 4.1-4.4
>>> Angstroem,
>>> 79 % solvent content, in P6222 space group; with 22 300 unique
>>> reflections
>>> and expected 1132 amino acid residues in the asymmetric unit, proper
>>> 2-fold
>>> rotational NCS (SAD phased and no high-resolution molecular replacement
>>> or
>>> homology model available).
>>>
>>> Assuming refinement of x,y,z, B and a polyalanine model (i.e. ca. 5700
>>> atoms), this would equal an observation:parameter ratio of roughly 1:1.
>>> This
>>> I think would be equivalent to a "normal" protein with 50 % solvent
>>> content,
>>> diffracting to better than 3 Angstroem resolution (from the statistics I
>>> could find, at that resolution a mean data:parameter ratio of ca. 0.9:1
>>> can
>>> be expected for refinement of x,y,z, and individual isotropic B; ignoring
>>> bond angle/length geometrical restraints at the moment).
>>>
>>> My question is how I could factor in the 2-fold rotational NCS for the
>>> estimate of the observations, assuming tight NCS restraints (or even
>>> constraint). It is normally assumed NCS reduces the noise by a factor of
>>> the
>>> square root of the NCS order, but I would be more interested how much it
>>> adds on the observation side (used as a restraint) or reduction of the
>>> parameters (used as a constraint). I don't suppose it would be correct to
>>> assume that the 2-fold NCS would half the number of parameters to refine
>>> (assuming an NCS constraint)?
>>>
>>> Regards,
>>>
>>> Florian
>>>
>>> -----------------------------------------------------------
>>> Florian Schmitzberger
>>> Biological Chemistry and Molecular Pharmacology
>>> Harvard Medical School
>>> 250 Longwood Avenue, SGM 130
>>> Boston, MA 02115, US
>>> Tel: 001 617 432 5602
>>>
>
> --
>
> *******************************************************
> Dirk Kostrewa
> Gene Center Munich, A5.07
> Department of Biochemistry
> Ludwig-Maximilians-Universität München
> Feodor-Lynen-Str. 25
> D-81377 Munich
> Germany
> Phone:  +49-89-2180-76845
> Fax:    +49-89-2180-76999
> E-mail: [email protected]
> WWW:    www.genzentrum.lmu.de
> *******************************************************
>

Re: [ccp4bb] Effect of NCS on estimate of data:parameter ratio

Reply via email to