Dirk, Apologies, my last e-mail was incomplete, I meant to say that there was one thing I should have added:
>From Table 2 in the paper the expected Rfree/Rwork ratio comes out as: < Rfree / Rwork > = sqrt( (f+m') / (f-m') ) = sqrt( (x+1) / (x-1) ) where x = f / m' = no of X-ray data / effective no of parameters, i.e. what I'm calling the 'observation/parameter ratio'. Note what happens as x -> 1 ! This shows the direct relationship between Rfree and the obs/param ratio defined in this way. Note that this definition comes straight out of the algebra, I didn't have to introduce it. If the form of the obs/param ratio that you are suggesting came out of the algebra in the same way I would be happy to accept it, but AFAICS it doesn't. Cheers -- Ian On Mon, Sep 20, 2010 at 12:22 PM, Ian Tickle <ianj...@gmail.com> wrote: > Hi Dirk > > First, constraints are just a special case of restraints in the limit > of infinite weights, in fact one way of getting constraints is simply > to use restraints with very large weights (though not too large that > you get rounding problems). These 'pseudo-constraints' will be > indistinguishable in effect from the 'real thing'. So why treat > restraints and constraints differently as far as the statistics are > concerned: the difference is purely one of implementation. > > Second, restraints are not interchangeable 1-for-1 with X-ray data as > far as the statistics are concerned: N restraints cannot be considered > as equivalent to N X-ray data, which would be the implication of > adding together the number of restraints and the number of X-ray data. > > This can be seen in the estimation of the expected values of the > residuals (chi-squared) for the working & test sets, which are used to > estimate the expected Rfree. If you take a look at our 1998 AC paper > (D54, 547-557), Table 2 (p.551), the last row of the table (labelled > 'RGfree/RG') shows the expected residuals for the working set > (denominator) and test set (numerator) for the cases of no restraints, > restrained and constrained refinement: > > No restraints (or constraints): > > <Dwork> = f - m > <Dfree> = f + m > > Restrained: > > <Dwork> = f - (m - r + Drest) > <Dfree> = f + (m - r + Drest) > > Constrained: > > <Dwork> = f - (m - r) > <Dfree> = f + (m - r) > > where: > > <Dwork> = expected working set residual (chi-squared), > <Dfree> = expected test set residual (chi-squared), > f = no of reflections in working set, > m = no of parameters, > r = no of restraints and/or constraints, > Drest = restraint residual (chi-squared). > > The constrained case is obviously just a special case of the > restrained case with Drest = 0, i.e. in the constrained case the > difference between the refined and target values is zero, and the 'no > restraints' case is a special case of this with r = 0. We can > generalise all of this by writing simply: > > <Dwork> = f - m' > <Dfree> = f + m' > > where m' is the effective no of parameters corrected for restraints > and/or constraints (m' = m - r + Drest); the effective no of > parameters is reduced whether you're using restraints or constraints. > In the case where you had both restraints and constraints r would be > the total no of restraints + constraints, however constraints > contribute nothing to Drest. The 'effectiveness' of a restraint > depends on its contribution to Drest (Z^2), a smaller value means it's > more effective. A contribution of Z^2 = 1 to Drest completely cancels > the effect of increasing r by 1 by adding the restraint (i.e. the > restraint has no effect). > > This incidentally shows that the effect of over-fitting (adding > redundant effective parameters) is to reduce the working set and > increase the test set residuals. If you consider the working set > residual in the general case: > > <Dwork> = f - (m - r + Drest) = f + r - m - Drest > > it certainly appears from this that the number of X-ray data (f) and > the number of restraints (r) are being added. > > However if you consider the test set residual: > > <Dfree> = f + (m - r + Drest) = f - r + m + Drest > > this is clearly not the case. All you can say is that the effective > number of parameters is reduced by the number of restraints + > constraints. > > Cheers > > -- Ian > > On Mon, Sep 20, 2010 at 9:20 AM, Dirk Kostrewa > <kostr...@genzentrum.lmu.de> wrote: >> Hi Ian, >> >> Am 19.09.10 15:25, schrieb Ian Tickle: >>> >>> Hi Florian, >>> >>> Tight NCS restraints or NCS constraints (they are essentially the same >>> thing in effect if not in implementation) both reduce the effective >>> parameter count on a 1-for-1 basis. >>> >>> Restraints should not be considered as being added to the pool of >>> X-ray observations in the calculation of the obs/param ratio, simply >>> because restraints and X-ray observations can in no way be regarded as >>> interchangeable (increasing the no of restraints by N is not >>> equivalent to increasing the no of reflections by N). This becomes >>> apparent when you try to compute the expected Rfree: the effective >>> contribution of the restraints has to be subtracted from the parameter >>> count, not added to the observation count. >> >> I always understood the difference between constraints and restraints such, >> that a constraint reduces the number of parameters by fixing certain >> parameters, whereas restraints are target values for parameters and as such >> can be counted as observations, similarly to the Fobs, which are target >> values for the Fcalc (although with different weights). I don't see what is >> wrong with this view. Do I misunderstand something? >> >> Best regards, >> >> Dirk. >> >>> The complication is that a 'weak' restraint is equivalent to less than >>> 1 parameter (I call it the 'effective no of restraints': it can be >>> calculated from the chi-squared for the restraint). Obviously no >>> restraint is equivalent no parameter, so you can think of it as a >>> continuous sliding scale from no restraint (effective contribution to >>> be subtracted from parameter count = 0) through weak restraint (0< >>> contribution< 1) through tight restraint (count ~=1) to constraint >>> (count = 1). >>> >>> Cheers >>> >>> -- Ian >>> >>> On Sat, Sep 18, 2010 at 9:23 PM, Florian Schmitzberger >>> <schmitzber...@crystal.harvard.edu> wrote: >>>> >>>> Dear All, >>>> >>>> I would have a question regarding the effect of non-crystallographic >>>> symmetry (NCS) on the data:parameter ratio in refinement. >>>> >>>> I am working with X-ray data to a maximum resolution of 4.1-4.4 >>>> Angstroem, >>>> 79 % solvent content, in P6222 space group; with 22 300 unique >>>> reflections >>>> and expected 1132 amino acid residues in the asymmetric unit, proper >>>> 2-fold >>>> rotational NCS (SAD phased and no high-resolution molecular replacement >>>> or >>>> homology model available). >>>> >>>> Assuming refinement of x,y,z, B and a polyalanine model (i.e. ca. 5700 >>>> atoms), this would equal an observation:parameter ratio of roughly 1:1. >>>> This >>>> I think would be equivalent to a "normal" protein with 50 % solvent >>>> content, >>>> diffracting to better than 3 Angstroem resolution (from the statistics I >>>> could find, at that resolution a mean data:parameter ratio of ca. 0.9:1 >>>> can >>>> be expected for refinement of x,y,z, and individual isotropic B; ignoring >>>> bond angle/length geometrical restraints at the moment). >>>> >>>> My question is how I could factor in the 2-fold rotational NCS for the >>>> estimate of the observations, assuming tight NCS restraints (or even >>>> constraint). It is normally assumed NCS reduces the noise by a factor of >>>> the >>>> square root of the NCS order, but I would be more interested how much it >>>> adds on the observation side (used as a restraint) or reduction of the >>>> parameters (used as a constraint). I don't suppose it would be correct to >>>> assume that the 2-fold NCS would half the number of parameters to refine >>>> (assuming an NCS constraint)? >>>> >>>> Regards, >>>> >>>> Florian >>>> >>>> ----------------------------------------------------------- >>>> Florian Schmitzberger >>>> Biological Chemistry and Molecular Pharmacology >>>> Harvard Medical School >>>> 250 Longwood Avenue, SGM 130 >>>> Boston, MA 02115, US >>>> Tel: 001 617 432 5602 >>>> >> >> -- >> >> ******************************************************* >> Dirk Kostrewa >> Gene Center Munich, A5.07 >> Department of Biochemistry >> Ludwig-Maximilians-Universität München >> Feodor-Lynen-Str. 25 >> D-81377 Munich >> Germany >> Phone: +49-89-2180-76845 >> Fax: +49-89-2180-76999 >> E-mail: kostr...@genzentrum.lmu.de >> WWW: www.genzentrum.lmu.de >> ******************************************************* >> >