Robbie That sounds more like a programming error: the issues I've identified are purely numerical and won't affect the program flow.
Cheers -- Ian On 4 December 2013 09:23, Robbie Joosten <[email protected]> wrote: > Hi Ian, > > > > This probably doesn't affect the MTZ file that comes out of it > > but it will affect the statistics of the twinning tests. > > I'd be interested to see if this solves the cases where ctruncate goes > into an infinite loop during the twinning tests. > > Cheers, > Robbie > > ------------------------------ > Date: Sat, 30 Nov 2013 15:37:44 +0000 > From: [email protected] > > Subject: [ccp4bb] TRUNCATE & CTRUNCATE issues. > To: [email protected] > > > > Hello All, I'd like to get your views on some changes I'm proposing to > make to the TRUNCATE source code. IMO there are some issues with the way > TRUNCATE does its statistical analyses which need to be fixed. This > probably doesn't affect the MTZ file that comes out of it but it will > affect the statistics of the twinning tests. I've been meaning to do this > for some time but it involves some fairly radical changes; now I've finally > decided to bite the bullet. > > One problem is that there seems to be some confusion in the source code > comments concerning the meaning of what I call the "symmetry enhancement > factor" (aka "epsilon" or "e" below). This is the point-group dependent > factor by which the mean intensities of special rows and zones are enhanced > by symmetry; for example in PG121 the mean I of the 0k0 reflections is > enhanced by a factor of 2; in PG6 & PG622, it's a factor of 6 for the 00l > reflections, and so on. > > For space groups with screw axes (or glide planes in enantiomorphic SGs, > i.e. the crystal contains both the asymmetric unit and its mirror image), > the mean is multiplied by e only if the systematic absences are omitted > from the sums; if you include them then there is no overall enhancement. > So rotation and screw axes (or mirrors and glide planes) need to be treated > differently, yet the CCP4 library code for this (s/rs EPSLN & EPSLON) treat > them identically (more on this below). TRUNCATE calls epsilon > alternatively a "multiplicity" or "weight" which further adds to the > confusion since multiplicity (m) usually means something completely > different (it's the number of times a symmetry-equivalent reflection occurs > in a full hemisphere of data, so for PG222 axial reflections (h00 etc.) > m=1, for zero layers (hk0 etc.) m=2, and for general hkl m=4: in contrast e > = 2, 1 and 1 resp.). > > This snippet of code from TRUNCATE which accumulates sums in bins > according to d*^2 illustrates the problem: > > CALL EPSLON(INHKL,WEIGHT,ISYSAB) > C > FF(NT) = FF(NT) + F/WEIGHT > SD(NT) = SD(NT) + SS/WEIGHT > N(NT) = N(NT) + 1 > AMULT = NSYM/WEIGHT > C > ... > C Accumulate sums for Wilson plot. > C > SN(NT) = SN(NT) + AMULT > SW(NT) = SW(NT) + AMULT*FFSCAT > SR(NT) = SR(NT) + AMULT*Q > SI(NT) = SI(NT) + AMULT*F > > Here WEIGHT = epsilon (the ISYSAB flag is ignored throughout), so all sums > are being accumulated with the terms multiplied by 1/e (NSYM is the no. of > asymmetric units so is constant and doesn't affect the results). However > IMO this factor should be applied only to individual intensities or their > SDs (note that in the above code and that below, F is actually the > uncorrected intensity just to further confuse you!). The problem here is > that for pure rotation axes the law of conservation of energy requires that > the overall mean I is unchanged. In any interference phenomenon energy can > neither be created nor destroyed, merely transferred from one place to > another. So what happens is that the enhanced intensity of the axial > reflections has been transferred from neighbouring reflections which have > their intensities diminished in total by the same amount. In fact an > oscillating Bessel function centred on the axis is superimposed on the > intensities so you get cylindical zones of alternating enhanced and > diminishing intensities with the magnitude of the oscillations dying away > as you go further from the axis. However the energy conservation law > requires that the net overall average I must be unchanged by the presence > of the axis. > > This implies that the 1/e correction factor SHOULD NOT be included for > pure rotation axes (and mirror planes) when summing for the mean I. > However the sums should be performed over a complete hemisphere given only > one symmetry equivalent per reflection, which means that the multiplicity > (m) factor SHOULD be included. This is the direct opposite of what the > code above is doing (i.e. it includes e but not m!). The Fortran code > would look like: > > SI(NT) = SI(NT) + M*FI # Total energy is conserved! > SN(NT) = SN(NT) + M # Count reflections in hemisphere. > > Note that the statistically valid procedure will be different if one is > say summing Is for a likelihood function, since this requires that the > terms are statistically independent so one would then add only one term per > equivalent, not a complete hemisphere, i.e. in that case the multiplicity > factors should be omitted. > > For screw axes (and glide planes) the situation is different: there is no > Bessel function and only the axial reflections are affected, so therefore > it requires different handling in the code (as I said above, this is not > happening!). Now, since systematic absences are normally not present in > the data, the mean intensity of the remaining reflections is enhanced by > the e factor, purely by the action of omitting the systematic absences of > zero I. This implies that we need to simulate the presence of the > systematic absences when taking the mean. So for example in the PG6 case > we would have to sum the intensities of the 00l, l=6n reflections WITHOUT > correction, but then count each 00l reflection as though it were 6 > reflections (i.e. also counting the omitted sys. abs. in the average), so > the code would now look like: > > SI(NT) = SI(NT) + M*FI # Total energy is still conserved! > SN(NT) = SN(NT) +M*E # but also count the sys. abs. that were > omitted. > > This implies that at least for the reflection counts, the e correction > factor SHOULD be included for screw axes (and glide planes), and for the > same reason as above the m factor SHOULD also be included. > > The differences between rotation and screw axes (or between mirrors and > glides) arise because Wilson's assumption of uniform random distribution of > atoms breaks down in the former case: an atom cannot approach a rotation > axis or mirror plane closer than its VDW radius, so this excluded zone > along the axis or plane causes an interference effect. In both cases the > main effect is actually that in projection along the axis the atomic > positions are not random: they are correlated by an apparent inversion > centre (it's actually another interference effect this time from pairs of > atoms lying in the same plane of reflection and related by symmetry). > > All the above is relevant ONLY to taking the average intensity. For other > purposes the correct procedure is likely to differ. For example if one is > interested in the individual normalised structure amplitudes for direct > methods (i.e. not just the mean), then the sqrt(1/e) factor clearly SHOULD > be applied to the individual amplitudes. Also if one is calculating higher > moments of Z (= normalised I) then the deviations will not cancel (there's > nothing in the energy conservation law that says that energy^n is conserved > if n is not 0 or 1). In the rotation axis case each large on-axis positive > deviation tends to be offset by several small off-axis deviations, so in > that case the optimal procedure would appear to be to multiply the on-axis > Is by m/e before summing for the higher moments (say n >= 2), e.g.: > > SM(N) = SM(N) + M*(FI/E)**N > NM(N) = NM(N) + M > > For the screw-axis case with the sys. abs. omitted the situation is again > different and requires different code; IMO we should distribute an amount > I/e from each axial reflection equally among the omitted sys. abs. and then > include them as though they were all present: > > SM(N) = SM(N) + M*E*(FI/E)**N > NM(N) = NM(N) + M*E > > One further issue needs to be addressed (TRUNCATE is not short of issues - > the ones I mention here are only a fraction!). The code for calculating > the moments in TRUNCATE is: > > C---- Sums for moments of I > F = FFA(1)/WEIGHT > IF(F.GT.0.0) SMMEMM(1,NT,ICEN) = SMMEMM(1,NT,ICEN) + sqrt(F) > IF(F.GT.0.0) SMMEMM(3,NT,ICEN) = SMMEMM(3,NT,ICEN) + F*sqrt(F) > SMMIMM(1,NT,ICEN) = SMMIMM(1,NT,ICEN) + F > SMMIMM(2,NT,ICEN) = SMMIMM(2,NT,ICEN) + F**2 > SMMIMM(3,NT,ICEN) = SMMIMM(3,NT,ICEN) + F**3 > SMMIMM(4,NT,ICEN) = SMMIMM(4,NT,ICEN) + F**4 > NMMNUM(NT,ICEN) = NMMNUM(NT,ICEN) + 1 > > Notwithstanding that the WEIGHT factor (aka epsilon) probably should not > be applied to the lower moments, and the multiplicity factor probably > should be applied everywhere, there is still a problem here. F is the > uncorrected intensity but IMO we should be using the corrected intensity > (i.e. by F & W's Bayesian procedure): why else are we calculating the > correction if not to apply it? The uncorrected intensities may be negative > which adds spurious noise to the moments (a negative intensity makes the > same contribution to an even moment as an equal positive intensity: this > cannot possibly be correct). However using the corrected Is brings another > problem: when calculating the moments we should be using the expected > values based on the posterior probability distribution of the Is. This > means integrating the moments over all expected Is > 0, multiplied by the > probability density. This is non-trivial, however there is a way using the > parabolic cylinder functions U(a,x) ( > http://en.wikipedia.org/wiki/Parabolic_cylinder_functions). The nice > thing about this method is one can generalise it to calculate the > Bayesian-corrected F and I as well as arbitrary moments and get rid of F & > W's cubic spline interpolation code with the slightly worrying warning (see > NEGIAS & NEGICS s/rs in TRUNCATE): > > C---- Accuracy - better than 5 percent (i think). > > With the PCFs the code becomes both much simpler and much more accurate > (to REAL*4 precision, so ~ 0.0001% accuracy!). > > I should also say that I believe that CTRUNCATE is not immune from these > issues: it seems to be a straight translation at least in part from Fortran > to C++ (though I'm far from being a C++ expert so I'll leave fixing > CTRUNCATE to those who know what they're doing!). > > Sorry about the length of this: at least you can't say I didn't consult > you! > > Cheers > > -- Ian > >
