Re: [ccp4bb] TRUNCATE & CTRUNCATE issues.

Ian Tickle Wed, 04 Dec 2013 02:31:00 -0800

Robbie

That sounds more like a programming error: the issues I've identified are
purely numerical and won't affect the program flow.


Cheers

-- Ian


On 4 December 2013 09:23, Robbie Joosten <[email protected]> wrote:

> Hi Ian,
>
>
> > This probably doesn't affect the MTZ file that comes out of it
> > but it will affect the statistics of the twinning tests.
>
> I'd be interested to see if this solves the cases where ctruncate goes
> into an infinite loop during the twinning tests.
>
> Cheers,
> Robbie
>
> ------------------------------
> Date: Sat, 30 Nov 2013 15:37:44 +0000
> From: [email protected]
>
> Subject: [ccp4bb] TRUNCATE & CTRUNCATE issues.
> To: [email protected]
>
>
>
> Hello All, I'd like to get your views on some changes I'm proposing to
> make to the TRUNCATE source code.  IMO there are some issues with the way
> TRUNCATE does its statistical analyses which need to be fixed.  This
> probably doesn't affect the MTZ file that comes out of it but it will
> affect the statistics of the twinning tests.  I've been meaning to do this
> for some time but it involves some fairly radical changes; now I've finally
> decided to bite the bullet.
>
> One problem is that there seems to be some confusion in the source code
> comments concerning the meaning of what I call the "symmetry enhancement
> factor" (aka "epsilon" or "e" below). This is the point-group dependent
> factor by which the mean intensities of special rows and zones are enhanced
> by symmetry; for example in PG121 the mean I of the 0k0 reflections is
> enhanced by a factor of 2; in PG6 & PG622, it's a factor of 6 for the 00l
> reflections, and so on.
>
> For space groups with screw axes (or glide planes in enantiomorphic SGs,
> i.e. the crystal contains both the asymmetric unit and its mirror image),
> the mean is multiplied by e only if the systematic absences are omitted
> from the sums; if you include them then there is no overall enhancement.
> So rotation and screw axes (or mirrors and glide planes) need to be treated
> differently, yet the CCP4 library code for this (s/rs EPSLN & EPSLON) treat
> them identically (more on this below).  TRUNCATE calls epsilon
> alternatively a "multiplicity" or "weight" which further adds to the
> confusion since multiplicity (m) usually means something completely
> different (it's the number of times a symmetry-equivalent reflection occurs
> in a full hemisphere of data, so for PG222 axial reflections (h00 etc.)
> m=1, for zero layers (hk0 etc.) m=2, and for general hkl m=4: in contrast e
> = 2, 1 and 1 resp.).
>
> This snippet of code from TRUNCATE which accumulates sums in bins
> according to d*^2 illustrates the problem:
>
>       CALL EPSLON(INHKL,WEIGHT,ISYSAB)
> C
>       FF(NT) = FF(NT) + F/WEIGHT
>       SD(NT) = SD(NT) + SS/WEIGHT
>       N(NT) = N(NT) + 1
>       AMULT = NSYM/WEIGHT
> C
> ...
> C     Accumulate sums for Wilson plot.
> C
>       SN(NT) = SN(NT) + AMULT
>       SW(NT) = SW(NT) + AMULT*FFSCAT
>       SR(NT) = SR(NT) + AMULT*Q
>       SI(NT) = SI(NT) + AMULT*F
>
> Here WEIGHT = epsilon (the ISYSAB flag is ignored throughout), so all sums
> are being accumulated with the terms multiplied by 1/e (NSYM is the no. of
> asymmetric units so is constant and doesn't affect the results).  However
> IMO this factor should be applied only to individual intensities or their
> SDs (note that in the above code and that below, F is actually the
> uncorrected intensity just to further confuse you!).  The problem here is
> that for pure rotation axes the law of conservation of energy requires that
> the overall mean I is unchanged.  In any interference phenomenon energy can
> neither be created nor destroyed, merely transferred from one place to
> another.  So what happens is that the enhanced intensity of the axial
> reflections has been transferred from neighbouring reflections which have
> their intensities diminished in total by the same amount.  In fact an
> oscillating Bessel function centred on the axis is superimposed on the
> intensities so you get cylindical zones of alternating enhanced and
> diminishing intensities with the magnitude of the oscillations dying away
> as you go further from the axis.  However the energy conservation law
> requires that the net overall average I must be unchanged by the presence
> of the axis.
>
> This implies that the 1/e correction factor SHOULD NOT be included for
> pure rotation axes (and mirror planes) when summing for the mean I.
> However the sums should be performed over a complete hemisphere given only
> one symmetry equivalent per reflection, which means that the multiplicity
> (m) factor SHOULD be included.  This is the direct opposite of what the
> code above is doing (i.e. it includes e but not m!).  The Fortran code
> would look like:
>
>        SI(NT) = SI(NT) + M*FI     # Total energy is conserved!
>        SN(NT) = SN(NT) + M       # Count reflections in hemisphere.
>
> Note that the statistically valid procedure will be different if one is
> say summing Is for a likelihood function, since this requires that the
> terms are statistically independent so one would then add only one term per
> equivalent, not a complete hemisphere, i.e. in that case the multiplicity
> factors should be omitted.
>
> For screw axes (and glide planes) the situation is different: there is no
> Bessel function and only the axial reflections are affected, so therefore
> it requires different handling in the code (as I said above, this is not
> happening!).  Now, since systematic absences are normally not present in
> the data, the mean intensity of the remaining reflections is enhanced by
> the e factor, purely by the action of omitting the systematic absences of
> zero I.  This implies that we need to simulate the presence of the
> systematic absences when taking the mean.  So for example in the PG6 case
> we would have to sum the intensities of the 00l, l=6n reflections WITHOUT
> correction, but then count each 00l reflection as though it were 6
> reflections (i.e. also counting the omitted sys. abs. in the average), so
> the code would now look like:
>
>        SI(NT) = SI(NT) + M*FI     # Total energy is still conserved!
>        SN(NT) = SN(NT) +M*E    # but also count the sys. abs. that were
> omitted.
>
> This implies that at least for the reflection counts, the e correction
> factor SHOULD be included for screw axes (and glide planes), and for the
> same reason as above the m factor SHOULD also be included.
>
> The differences between rotation and screw axes (or between mirrors and
> glides) arise because Wilson's assumption of uniform random distribution of
> atoms breaks down in the former case: an atom cannot approach a rotation
> axis or mirror plane closer than its VDW radius, so this excluded zone
> along the axis or plane causes an interference effect.  In both cases the
> main effect is actually that in projection along the axis the atomic
> positions are not random: they are correlated by an apparent inversion
> centre (it's actually another interference effect this time from pairs of
> atoms lying in the same plane of reflection and related by symmetry).
>
> All the above is relevant ONLY to taking the average intensity.  For other
> purposes the correct procedure is likely to differ.  For example if one is
> interested in the individual normalised structure amplitudes for direct
> methods (i.e. not just the mean), then the sqrt(1/e) factor clearly SHOULD
> be applied to the individual amplitudes.  Also if one is calculating higher
> moments of Z (= normalised I) then the deviations will not cancel (there's
> nothing in the energy conservation law that says that energy^n is conserved
> if n is not 0 or 1).  In the rotation axis case each large on-axis positive
> deviation tends to be offset by several small off-axis deviations, so in
> that case the optimal procedure would appear to be to multiply the on-axis
> Is by m/e before summing for the higher moments (say n >= 2), e.g.:
>
>       SM(N) = SM(N) + M*(FI/E)**N
>       NM(N) = NM(N) + M
>
> For the screw-axis case with the sys. abs. omitted the situation is again
> different and requires different code; IMO we should distribute an amount
> I/e from each axial reflection equally among the omitted sys. abs. and then
> include them as though they were all present:
>
>       SM(N) = SM(N) + M*E*(FI/E)**N
>       NM(N) = NM(N) + M*E
>
> One further issue needs to be addressed (TRUNCATE is not short of issues -
> the ones I mention here are only a fraction!).  The code for calculating
> the moments in TRUNCATE is:
>
> C---- Sums for moments of I
>          F = FFA(1)/WEIGHT
>          IF(F.GT.0.0) SMMEMM(1,NT,ICEN) = SMMEMM(1,NT,ICEN) + sqrt(F)
>          IF(F.GT.0.0) SMMEMM(3,NT,ICEN) = SMMEMM(3,NT,ICEN) + F*sqrt(F)
>          SMMIMM(1,NT,ICEN) = SMMIMM(1,NT,ICEN) + F
>          SMMIMM(2,NT,ICEN) = SMMIMM(2,NT,ICEN) + F**2
>          SMMIMM(3,NT,ICEN) = SMMIMM(3,NT,ICEN) + F**3
>          SMMIMM(4,NT,ICEN) = SMMIMM(4,NT,ICEN) + F**4
>          NMMNUM(NT,ICEN) = NMMNUM(NT,ICEN) + 1
>
> Notwithstanding that the WEIGHT factor (aka epsilon) probably should not
> be applied to the lower moments, and the multiplicity factor probably
> should be applied everywhere, there is still a problem here.  F is the
> uncorrected intensity but IMO we should be using the corrected intensity
> (i.e. by F & W's Bayesian procedure): why else are we calculating the
> correction if not to apply it?  The uncorrected intensities may be negative
> which adds spurious noise to the moments (a negative intensity makes the
> same contribution to an even moment as an equal positive intensity: this
> cannot possibly be correct).  However using the corrected Is brings another
> problem: when calculating the moments we should be using the expected
> values based on the posterior probability distribution of the Is.  This
> means integrating the moments over all expected Is > 0, multiplied by the
> probability density.  This is non-trivial, however there is a way using the
> parabolic cylinder functions U(a,x) (
> http://en.wikipedia.org/wiki/Parabolic_cylinder_functions).  The nice
> thing about this method is one can generalise it to calculate the
> Bayesian-corrected F and I as well as arbitrary moments and get rid of F &
> W's cubic spline interpolation code with the slightly worrying warning (see
> NEGIAS & NEGICS s/rs in TRUNCATE):
>
> C---- Accuracy - better than 5 percent  (i think).
>
> With the PCFs the code becomes both much simpler and much more accurate
> (to REAL*4 precision, so ~ 0.0001% accuracy!).
>
> I should also say that I believe that CTRUNCATE is not immune from these
> issues: it seems to be a straight translation at least in part from Fortran
> to C++ (though I'm far from being a C++ expert so I'll leave fixing
> CTRUNCATE to those who know what they're doing!).
>
> Sorry about the length of this: at least you can't say I didn't consult
> you!
>
> Cheers
>
> -- Ian
>
>

Re: [ccp4bb] TRUNCATE & CTRUNCATE issues.

Reply via email to