Re: [ccp4bb] Effect of NCS on estimate of data:parameter ratio

Ian Tickle Tue, 21 Sep 2010 02:31:06 -0700

Dirk,

Apologies, my last e-mail was incomplete, I meant to say that there
was one thing I should have added:


>From Table 2 in the paper the expected Rfree/Rwork ratio comes out as:

      < Rfree / Rwork >  =  sqrt( (f+m') / (f-m') ) = sqrt( (x+1) / (x-1) )

where x = f / m' = no of X-ray data / effective no of parameters, i.e.
what I'm calling the 'observation/parameter ratio'.  Note what happens
as x -> 1 !

This shows the direct relationship between Rfree and the obs/param
ratio defined in this way.  Note that this definition comes straight
out of the algebra, I didn't have to introduce it.  If the form of the
obs/param ratio that you are suggesting came out of the algebra in the
same way I would be happy to accept it, but AFAICS it doesn't.

Cheers

-- Ian

On Mon, Sep 20, 2010 at 12:22 PM, Ian Tickle <[email protected]> wrote:
> Hi Dirk
>
> First, constraints are just a special case of restraints in the limit
> of infinite weights, in fact one way of getting constraints is simply
> to use restraints with very large weights (though not too large that
> you get rounding problems). These 'pseudo-constraints' will be
> indistinguishable in effect from the 'real thing'.  So why treat
> restraints and constraints differently as far as the statistics are
> concerned: the difference is purely one of implementation.
>
> Second, restraints are not interchangeable 1-for-1 with X-ray data as
> far as the statistics are concerned: N restraints cannot be considered
> as equivalent to N X-ray data, which would be the implication of
> adding together the number of restraints and the number of X-ray data.
>
> This can be seen in the estimation of the expected values of the
> residuals (chi-squared) for the working & test sets, which are used to
> estimate the expected Rfree.  If you take a look at our 1998 AC paper
> (D54, 547-557), Table 2 (p.551), the last row of the table (labelled
> 'RGfree/RG') shows the expected residuals for the working set
> (denominator) and test set (numerator) for the cases of no restraints,
> restrained and constrained refinement:
>
> No restraints (or constraints):
>
> <Dwork> = f - m
> <Dfree>  = f + m
>
> Restrained:
>
> <Dwork> = f - (m - r + Drest)
> <Dfree>  = f + (m - r + Drest)
>
> Constrained:
>
> <Dwork> = f - (m - r)
> <Dfree>  = f + (m - r)
>
> where:
>
> <Dwork> = expected working set residual (chi-squared),
> <Dfree> = expected test set residual (chi-squared),
> f   = no of reflections in working set,
> m = no of parameters,
> r   = no of restraints and/or constraints,
> Drest = restraint residual (chi-squared).
>
> The constrained case is obviously just a special case of the
> restrained case with Drest = 0, i.e. in the constrained case the
> difference between the refined and target values is zero, and the 'no
> restraints' case is a special case of this with r = 0.  We can
> generalise all of this by writing simply:
>
> <Dwork> = f - m'
> <Dfree>  = f + m'
>
> where m' is the effective no of parameters corrected for restraints
> and/or constraints (m' = m - r + Drest); the effective no of
> parameters is reduced whether you're using restraints or constraints.
> In the case where you had both restraints and constraints r would be
> the total no of restraints + constraints, however constraints
> contribute nothing to Drest.  The 'effectiveness' of a restraint
> depends on its contribution to Drest (Z^2), a smaller value means it's
> more effective.  A contribution of Z^2 = 1 to Drest completely cancels
> the effect of increasing r by 1 by adding the restraint (i.e. the
> restraint has no effect).
>
> This incidentally shows that the effect of over-fitting (adding
> redundant effective parameters) is to reduce the working set and
> increase the test set residuals.  If you consider the working set
> residual in the general case:
>
> <Dwork> = f - (m - r + Drest) = f + r - m - Drest
>
> it certainly appears from this that the number of X-ray data (f) and
> the number of restraints (r) are being added.
>
> However if you consider the test set residual:
>
> <Dfree>  = f + (m - r + Drest) = f - r + m + Drest
>
> this is clearly not the case.  All you can say is that the effective
> number of parameters is reduced by the number of restraints +
> constraints.
>
> Cheers
>
> -- Ian
>
> On Mon, Sep 20, 2010 at 9:20 AM, Dirk Kostrewa
> <[email protected]> wrote:
>>  Hi Ian,
>>
>> Am 19.09.10 15:25, schrieb Ian Tickle:
>>>
>>> Hi Florian,
>>>
>>> Tight NCS restraints or NCS constraints (they are essentially the same
>>> thing in effect if not in implementation) both reduce the effective
>>> parameter count on a 1-for-1 basis.
>>>
>>> Restraints should not be considered as being added to the pool of
>>> X-ray observations in the calculation of the obs/param ratio, simply
>>> because restraints and X-ray observations can in no way be regarded as
>>> interchangeable (increasing the no of restraints by N is not
>>> equivalent to increasing the no of reflections by N).  This becomes
>>> apparent when you try to compute the expected Rfree: the effective
>>> contribution of the restraints has to be subtracted from the parameter
>>> count, not added to the observation count.
>>
>> I always understood the difference between constraints and restraints such,
>> that a constraint reduces the number of parameters by fixing certain
>> parameters, whereas restraints are target values for parameters and as such
>> can be counted as observations, similarly to the Fobs, which are target
>> values for the Fcalc (although with different weights). I don't see what is
>> wrong with this view. Do I misunderstand something?
>>
>> Best regards,
>>
>> Dirk.
>>
>>> The complication is that a 'weak' restraint is equivalent to less than
>>> 1 parameter (I call it the 'effective no of restraints': it can be
>>> calculated from the chi-squared for the restraint).  Obviously no
>>> restraint is equivalent no parameter, so you can think of it as a
>>> continuous sliding scale from no restraint (effective contribution to
>>> be subtracted from parameter count = 0) through weak restraint (0<
>>> contribution<  1) through tight restraint (count ~=1) to constraint
>>> (count = 1).
>>>
>>> Cheers
>>>
>>> -- Ian
>>>
>>> On Sat, Sep 18, 2010 at 9:23 PM, Florian Schmitzberger
>>> <[email protected]>  wrote:
>>>>
>>>> Dear All,
>>>>
>>>> I would have a question regarding the effect of non-crystallographic
>>>> symmetry (NCS) on the data:parameter ratio in refinement.
>>>>
>>>> I am working with X-ray data to a maximum resolution of 4.1-4.4
>>>> Angstroem,
>>>> 79 % solvent content, in P6222 space group; with 22 300 unique
>>>> reflections
>>>> and expected 1132 amino acid residues in the asymmetric unit, proper
>>>> 2-fold
>>>> rotational NCS (SAD phased and no high-resolution molecular replacement
>>>> or
>>>> homology model available).
>>>>
>>>> Assuming refinement of x,y,z, B and a polyalanine model (i.e. ca. 5700
>>>> atoms), this would equal an observation:parameter ratio of roughly 1:1.
>>>> This
>>>> I think would be equivalent to a "normal" protein with 50 % solvent
>>>> content,
>>>> diffracting to better than 3 Angstroem resolution (from the statistics I
>>>> could find, at that resolution a mean data:parameter ratio of ca. 0.9:1
>>>> can
>>>> be expected for refinement of x,y,z, and individual isotropic B; ignoring
>>>> bond angle/length geometrical restraints at the moment).
>>>>
>>>> My question is how I could factor in the 2-fold rotational NCS for the
>>>> estimate of the observations, assuming tight NCS restraints (or even
>>>> constraint). It is normally assumed NCS reduces the noise by a factor of
>>>> the
>>>> square root of the NCS order, but I would be more interested how much it
>>>> adds on the observation side (used as a restraint) or reduction of the
>>>> parameters (used as a constraint). I don't suppose it would be correct to
>>>> assume that the 2-fold NCS would half the number of parameters to refine
>>>> (assuming an NCS constraint)?
>>>>
>>>> Regards,
>>>>
>>>> Florian
>>>>
>>>> -----------------------------------------------------------
>>>> Florian Schmitzberger
>>>> Biological Chemistry and Molecular Pharmacology
>>>> Harvard Medical School
>>>> 250 Longwood Avenue, SGM 130
>>>> Boston, MA 02115, US
>>>> Tel: 001 617 432 5602
>>>>
>>
>> --
>>
>> *******************************************************
>> Dirk Kostrewa
>> Gene Center Munich, A5.07
>> Department of Biochemistry
>> Ludwig-Maximilians-Universität München
>> Feodor-Lynen-Str. 25
>> D-81377 Munich
>> Germany
>> Phone:  +49-89-2180-76845
>> Fax:    +49-89-2180-76999
>> E-mail: [email protected]
>> WWW:    www.genzentrum.lmu.de
>> *******************************************************
>>
>

Re: [ccp4bb] Effect of NCS on estimate of data:parameter ratio

Reply via email to