Re: [ccp4bb] [proc-develop] Fwd: [ccp4bb] How to divide an equal amount of reflections into 20 bins? (like Staraniso does it)

Ian Tickle Wed, 03 Dec 2025 09:05:01 -0800

Dear All

I thought it worth adding for completeness a further point concerning
anisotropic cutoffs to Clemens's excellent explanation.  I have come across
claims that an anisotropic cutoff distorts the intensity statistics.  I
will argue and demonstrate that actually the exact opposite is true: NOT
using an anisotropic cutoff can bias the moments away from their expected
values based on error-free data.


For the error-free acentric Wilson distribution it's easy to show that the
2nd raw moment E(Z^2) of the normalised intensity Z = I / E(I) is 2.  The
r'th raw moment of a random variable x is the statistical expectation (mean
value) of the r'th power of x (x^r: r need not be an integer).  A
"raw moment" is one where the deviations are relative to zero as opposed to
a "central moment" where deviations are relative to the mean (the mean is
the 1st raw moment, the variance is the 2nd central moment, the skewness is
the 3rd central moment etc.).  Moments have an important practical
application in the detection of twinning.

In reality there will always be experimental errors and if we assume the
usual approximation of Gaussian random errors in the intensities, one can
show that the 2nd raw moment of the normalised intensity for the acentric
Wilson distribution is:

                  E(Z^2) = 2 + <sigma(I)^2> / S^2

where S is the Wilson distribution parameter.  This means that for weak
data where sigma(I) < S, the errors have the effect of inflating the
moments (one can show by generalising the above expression that the effect
is even greater for higher order moments r > 2).  Since the effect of
twinning is to reduce all the moments with r > 1 (i.e. the twinned
intensity distribution has fewer extreme values compared with the untwinned
case), the effect of random errors will be to mask the effects of twinning
in the tests.

For my test data (courtesy of Eleanor), STARANISO gives anisotropic
diffraction limits (1.96, 1.62, 1.40) so that for reflections at lower
resolution than the lowest d* (1/1.96) the data are almost complete and the
effect of weak data on the moments is negligible.  Between the lower and
upper limits (d* = 1/1.96 to 1/1.40) the proportion of weak data inside a
sphere will progressively increase causing the moments to increase
according to the equation above.  Beyond the bounding sphere of radius d* =
1/1.4 where all the data are weak, the moments tend to infinity as S
approaches zero, so one definitely does not want to include those, but by
the same argument one does not want to include weak data between the
anisotropic cutoff surface and the bounding sphere either.  This is where
the anisotropic cutoff comes in.  The usual isotropic completeness will
progressively decrease between these limits, whereas the more relevant
anisotropic completeness (defined as the proportion of possible observed
reflections that are actually observed) will remain high, and the moments
will remain at their expected values for error-free data.  It is not
possible for weak data to be observed (by definition) so they should not be
counted as 'possibles'.

The attached plot demonstrates this for the test data (acentric
reflections only).  This was run first with CTRUNCATE which applies an
anisotropy correction but not an anisotropic cutoff.  CTRUNCATE outputs the
same reflections used for the plot to its output MTZ file.  Then the same
data was run with STARANISO and the output of that put through CTRUNCATE
just to get comparable plots (in that case CTRUNCATE would not apply an
anisotropy correction because the data have already been
anisotropy-corrected by STARANISO).  STARANISO only outputs the reflections
used in the plot.

One can see exactly the effects described above in the plots.  There are
progressively fewer reflections in the range d* = 1/1.96 to 1/1.4 in the
STARANISO plot because they have been cut by the anisotropic cutoff leaving
only the "good stuff".  Note that CTRUNCATE mislabels the plots as moments
of I (<I^2>) whereas they are plainly moments of Z (<Z^2> = <I^2> / <I>^2).

One final point: the equation shows that tests should always be performed
on expectations NOT on measured values: this is very important.  STARANISO
goes to considerable lengths to compute expectations, typically using local
averages or global solid harmonics function fitting.  So a reflection is
classed as "weak" only when the local average or interpolated mean
I/sigma(I) of neighbouring reflections is low (usually < 1.2), not
necessarily when the individual measured I/sigma(I) is low.  Some legacy
programs apply cutoffs using the measured I/sigma(I) or F/sigma(F) as a
criterion: never a good idea!

 Cheers

-- Ian
(Global Phasing, Cambridge, UK).



> > ---------- Forwarded message ---------
>> > From: Clemens Vonrhein <[email protected]>
>> > Date: Mon, 24 Nov 2025 at 18:58
>> > Subject: Re: [ccp4bb] How to divide an equal amount of reflections into
>> 20
>> > bins? (like Staraniso does it)
>> > To: <[email protected]>
>> >
>> >
>> > Dear all,
>> >
>> > Just a few technical points and clarifications ... to avoid
>> > unneceessary confusion in the future (CCP4bb postings can have a long
>> > shelf life) :-)
>> >
>> > On Tue, Nov 18, 2025 at 06:55:15PM +0000, Martin Moche wrote:
>> > > As we all know STARANISO selects the best ellipse, instead of
>> > > sphere, when integrating reflections.
>> >
>> > STARANISO does not do any integration: that is part of the actual
>> > integration program reading the raw diffraction images and turning
>> > them into a list of unscaled+unmerged intensities (and sigmas).
>> >
>> > The next step then is to determine appropriate scale/correction
>> > factors to turn those into a set of scaled+unmerged
>> > intensities. Those could be fed into STARANISO, but more typically it
>> > uses the scaled+merged (using inverse-variance weighting) intensities
>> > to determine a cutoff surface.
>> >
>> > That cutoff can have any shape that is constrained to the point-group
>> > symmetry (plus a centre of symmetry): the term "anisotropic" does not
>> > imply that it has to be an ellipsoid - just that it is not constrained
>> > to be isotropic (as in a spherical cutoff surface based on per-bin
>> > statistics and a single high-resolution value) [1].
>> >
>> > STARANISO fits an ellipsoid to that generic anisotropic cutoff surface
>> > for two main reasons: (1) providing three diffraction limits along the
>> > axes of that ellipsoid to have a reasonably simple description of any
>> > potential anisotropic diffraction, and (2) computing statistics -
>> > especially completeness - to accurately describe the outcome of the
>> > diffraction experiment.
>> >
>> > > In my experience this is good because
>> > >
>> > >
>> > > 1. Crystals generally diffract better in some directions than
>> > >    others
>> > > 2. The electron density maps look better after STARANISO elliptical
>> > >    data truncation, as compared to spherical data truncation in XDS,
>> > >    XDSAPP3, DIALS etc. so it becomes easier and faster to build a
>> > >    model and refine a structure.
>> >
>> > One caveat depending on the refinement program/procedure used: the DFc
>> > completion used by most (all?) refinement programs will provide purely
>> > model-based map coefficients to the electron density map for those
>> > reflections missing in the initial list of reflections in the input
>> > MTZ file. That is a sensible approach when those reflections are
>> > missing because they weren't measured although we assume they should be
>> > observable (think: reflections behind beamstop, in detector module
>> > gaps or lost due to cusp).
>> >
>> > It becomes problematic when there is no check against "observability"
>> > at the map computation stage and one just uses all reflections with
>> > Miller indices in the MTZ file to assign DFc values to the 2mFo-DFc
>> > electron density maps when there is no Fo value - e.g. for highly
>> > anisotropic data (where a large number of reflections are unobservable
>> > in some directions at higher resolution and those are therefore not
>> > present in the list of Fo).
>> >
>> > This is why BUSTER will output three different sets of
>> > electron-density map coefficients:
>> >
>> >   2FOFCWT            / PH2FOCWT              <<< for all Fo
>> >   2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill   <<< for all HKL within the
>> > cutoff surface, i.e. defined as "observable"
>> >   2FOFCWT_iso-fill   / PH2FOCWT_iso-fill     <<< to the highest d* of
>> any
>> > Miller index in the input file
>> >
>> > (Other refinement packages do something similar and provide several
>> > map coefficients). As you can see, using the last set (map
>> > coefficients in a sphere) can create artificially "nice" maps due to
>> > model bias: if you see near perfect density over the full model you
>> > might want to check if the correct electron density map is used
>> > (2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill is what we would recommend).
>> >
>> > If you used the STARANISO output file as input into refinement and on
>> > output you only get that last set of map coefficients (i.e. all
>> > reflections in a sphere to the high resolution limit), you could run
>> > something like the following set of commands to fix this:
>> >
>> >   cad hklin1 staraniso_alldata-unique.mtz \
>> >       hklin2 refinement_out.mtz \
>> >       hklout tmp.mtz \
>> >       <<e
>> >   LABI FILE 1 E1=SA_flag
>> >   LABI FILE 2 E1=F E2=SIGF E3=FWT E4=PHWT
>> >   e
>> >
>> >   sftools <<e
>> >   read tmp.mtz
>> >   select col SA_flag > 0
>> >   calc F col 2FOFCWT_aniso-fill = col FWT
>> >   calc P col PH2FOFCWT_aniso-fill = col PHWT
>> >   select all
>> >   calc F col 2FOFCWT_iso-fill = col FWT
>> >   calc P col PH2FOFCWT_iso-fill = col PHWT
>> >   delete col FWT
>> >   delete col PHWT
>> >   select col F = present
>> >   calc F col 2FOFCWT = col 2FOFCWT_iso-fill
>> >   calc P col PH2FOFCWT = col PH2FOFCWT_iso-fill
>> >   select all
>> >   write e-density.mtz
>> >   stop
>> >   e
>> >
>> > Now you'll have all three sets of electron-density map coefficients at
>> > your disposal.
>> >
>> > The above steps might be necessary especially when you run refinement
>> > through an interface that tries to be helpful and "completes" the
>> > incoming MTZ file with test-set flags right to largest d* value of any
>> > observed reflection [2]. Those dummy Miller indices (denoting a
>> > reciprocal lattice point with no associated data) will receive purely
>> > model-based DFc values for electron density map computations.
>> >
>> > > To my understanding AIMLESS assumes 100 % completeness when creating
>> > > its 20 bins, and elliptically truncated data
>> >
>> > STARANISO doesn't truncate the data ellipsoidally: it uses the
>> > anisotropic cutoff surface as described above (the ellipsoid is just a
>> > construct to make reporting easier).
>> >
>> > > has extremely low completeness in the highest resolution shells,
>> >
>> > True - if we assume that all reciprocal lattice points (Miller
>> > indices) within the sphere out to the high-resolution value of the
>> > outer shell are potentially observable. But that isn't quite what
>> > "completeness" tries to measure: we want to know how many reflections
>> > we measured relative to the number of reflections we could've
>> > measured. This is something quite different.
>> >
>> > If we had a crystal with perfect isotropic diffraction we are happy to
>> > define some outer limit (i.e. a sphere described by a radius) where we
>> > say: "We could have measured all reflections within that sphere and
>> > everything outside of that sphere is just not observable (e.g. too
>> > weak)". That makes complete sense: we don't want to pretend that every
>> > crystal diffracts to 0.5A resolution and therefore we should compute
>> > overall completeness always to a high-resolution limit of 0.5A
>> > ... that would be silly.
>> >
>> > The same applies to a crystal with anisotropic diffraction: we need to
>> > define a region in reciprocal space where we can say the same thing
>> > ("inside is observable and outside is not"). STARANISO provides us
>> > that shape as an ellipsoid defined by three axis lengths (diffraction
>> > limits). So now we compute the completeness relative to the
>> > reflections within that shape - not much different to the concept of a
>> > sphere, but more accurate in this case.
>> >
>> > > so Table 1 is looking very odd with extremely low completeness in
>> > > the highest resolution shell when using REFMAC5 after STARANISO.
>> >
>> > That is why STARANISO (or autoPROC) report the "Completeness
>> > (ellipsoidal)" in their Table1 and the deposition-ready mmCIF file [3]
>> > - which is what we think should be reported in papers etc.
>> >
>> > Hope that helps to clarify some of the finer details.
>> >
>> > Cheers
>> >
>> > Clemens & Ian
>> >
>> > [1] https://staraniso.globalphasing.org/anisotropy_about.html
>> > [2] https://staraniso.globalphasing.org/test_set_flags_about.html
>> > [3] see eg. https://staraniso.globalphasing.org/table1/q6/8q68.html
>> >
>> > ########################################################################
>> >
>> > To unsubscribe from the CCP4BB list, click the following link:
>> > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>> >
>> > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
>> mailing
>> > list hosted by www.jiscmail.ac.uk, terms & conditions are available at
>> > https://www.jiscmail.ac.uk/policyandsecurity/
>>
>> > _______________________________________________
>> > proc-develop mailing list
>> > [email protected]
>> > http://wwi.globalphasing.com/mailman/listinfo/proc-develop
>>
>>
>> --
>>
>> *--------------------------------------------------------------
>> * Clemens Vonrhein, Ph.D.     vonrhein AT GlobalPhasing DOT com
>> * Global Phasing Ltd., 9 Journey Campus, Castle Park
>> * Cambridge CB3 0AX, UK                   www.globalphasing.com
>> *--------------------------------------------------------------
>>
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] [proc-develop] Fwd: [ccp4bb] How to divide an equal amount of reflections into 20 bins? (like Staraniso does it)

Reply via email to