Re: [ccp4bb] [proc-develop] Fwd: [ccp4bb] How to divide an equal amount of reflections into 20 bins? (like Staraniso does it)

Ian Tickle Wed, 03 Dec 2025 23:07:51 -0800

Hi Randy

Thanks, that's a neat tip, I'll give it a try.


Cheers

-- Ian


On Thu, 4 Dec 2025, 03:02 Randy John Read, <[email protected]> wrote:

> Dear Ian,
>
> Those are good points, but it’s worth noting that in the plots that Phaser
> produces for the observed vs theoretical second moments, the effects of the
> intensity standard deviations are taken into account in the theoretical
> curve. Seeing that the curves match serves as evidence both that the
> crystal is untwinned and that the standard deviations have been estimated
> appropriately.
>
> Best wishes,
>
> Randy
>
> -----
> Randy J. Read
> Department of Haematology, University of Cambridge
> Cambridge Institute for Medical Research     Tel: +44 1223 336500
> The Keith Peters Building
> Hills Road                                                       E-mail:
> [email protected]
> Cambridge CB2 0XY, U.K.
> www-structmed.cimr.cam.ac.uk
>
> *From: *CCP4 bulletin board <[email protected]> on behalf of Ian
> Tickle <[email protected]>
> *Date: *Wednesday, 3 December 2025 at 17:05
> *To: *[email protected] <[email protected]>
> *Subject: *Re: [ccp4bb] [proc-develop] Fwd: [ccp4bb] How to divide an
> equal amount of reflections into 20 bins? (like Staraniso does it)
>
>
> Dear All
>
> I thought it worth adding for completeness a further point concerning
> anisotropic cutoffs to Clemens's excellent explanation.  I have come across
> claims that an anisotropic cutoff distorts the intensity statistics.  I
> will argue and demonstrate that actually the exact opposite is true: NOT
> using an anisotropic cutoff can bias the moments away from their expected
> values based on error-free data.
>
> For the error-free acentric Wilson distribution it's easy to show that the
> 2nd raw moment E(Z^2) of the normalised intensity Z = I / E(I) is 2.  The
> r'th raw moment of a random variable x is the statistical expectation (mean
> value) of the r'th power of x (x^r: r need not be an integer).  A
> "raw moment" is one where the deviations are relative to zero as opposed to
> a "central moment" where deviations are relative to the mean (the mean is
> the 1st raw moment, the variance is the 2nd central moment, the skewness is
> the 3rd central moment etc.).  Moments have an important practical
> application in the detection of twinning.
>
> In reality there will always be experimental errors and if we assume the
> usual approximation of Gaussian random errors in the intensities, one can
> show that the 2nd raw moment of the normalised intensity for the acentric
> Wilson distribution is:
>
>                   E(Z^2) = 2 + <sigma(I)^2> / S^2
>
> where S is the Wilson distribution parameter.  This means that for weak
> data where sigma(I) < S, the errors have the effect of inflating the
> moments (one can show by generalising the above expression that the effect
> is even greater for higher order moments r > 2).  Since the effect of
> twinning is to reduce all the moments with r > 1 (i.e. the twinned
> intensity distribution has fewer extreme values compared with the untwinned
> case), the effect of random errors will be to mask the effects of twinning
> in the tests.
>
> For my test data (courtesy of Eleanor), STARANISO gives anisotropic
> diffraction limits (1.96, 1.62, 1.40) so that for reflections at lower
> resolution than the lowest d* (1/1.96) the data are almost complete and the
> effect of weak data on the moments is negligible.  Between the lower and
> upper limits (d* = 1/1.96 to 1/1.40) the proportion of weak data inside a
> sphere will progressively increase causing the moments to increase
> according to the equation above.  Beyond the bounding sphere of radius d* =
> 1/1.4 where all the data are weak, the moments tend to infinity as S
> approaches zero, so one definitely does not want to include those, but by
> the same argument one does not want to include weak data between the
> anisotropic cutoff surface and the bounding sphere either.  This is where
> the anisotropic cutoff comes in.  The usual isotropic completeness will
> progressively decrease between these limits, whereas the more relevant
> anisotropic completeness (defined as the proportion of possible observed
> reflections that are actually observed) will remain high, and the moments
> will remain at their expected values for error-free data.  It is not
> possible for weak data to be observed (by definition) so they should not be
> counted as 'possibles'.
>
> The attached plot demonstrates this for the test data (acentric
> reflections only).  This was run first with CTRUNCATE which applies an
> anisotropy correction but not an anisotropic cutoff.  CTRUNCATE outputs the
> same reflections used for the plot to its output MTZ file.  Then the same
> data was run with STARANISO and the output of that put through CTRUNCATE
> just to get comparable plots (in that case CTRUNCATE would not apply an
> anisotropy correction because the data have already been
> anisotropy-corrected by STARANISO).  STARANISO only outputs the reflections
> used in the plot.
>
> One can see exactly the effects described above in the plots.  There are
> progressively fewer reflections in the range d* = 1/1.96 to 1/1.4 in the
> STARANISO plot because they have been cut by the anisotropic cutoff leaving
> only the "good stuff".  Note that CTRUNCATE mislabels the plots as moments
> of I (<I^2>) whereas they are plainly moments of Z (<Z^2> = <I^2> / <I>^2).
>
> One final point: the equation shows that tests should always be performed
> on expectations NOT on measured values: this is very important.  STARANISO
> goes to considerable lengths to compute expectations, typically using local
> averages or global solid harmonics function fitting.  So a reflection is
> classed as "weak" only when the local average or interpolated mean
> I/sigma(I) of neighbouring reflections is low (usually < 1.2), not
> necessarily when the individual measured I/sigma(I) is low.  Some legacy
> programs apply cutoffs using the measured I/sigma(I) or F/sigma(F) as a
> criterion: never a good idea!
>
>  Cheers
>
> -- Ian
> (Global Phasing, Cambridge, UK).
>
>
>
> > ---------- Forwarded message ---------
> > From: Clemens Vonrhein <[email protected]>
> > Date: Mon, 24 Nov 2025 at 18:58
> > Subject: Re: [ccp4bb] How to divide an equal amount of reflections into
> 20
> > bins? (like Staraniso does it)
> > To: <[email protected]>
> >
> >
> > Dear all,
> >
> > Just a few technical points and clarifications ... to avoid
> > unneceessary confusion in the future (CCP4bb postings can have a long
> > shelf life) :-)
> >
> > On Tue, Nov 18, 2025 at 06:55:15PM +0000, Martin Moche wrote:
> > > As we all know STARANISO selects the best ellipse, instead of
> > > sphere, when integrating reflections.
> >
> > STARANISO does not do any integration: that is part of the actual
> > integration program reading the raw diffraction images and turning
> > them into a list of unscaled+unmerged intensities (and sigmas).
> >
> > The next step then is to determine appropriate scale/correction
> > factors to turn those into a set of scaled+unmerged
> > intensities. Those could be fed into STARANISO, but more typically it
> > uses the scaled+merged (using inverse-variance weighting) intensities
> > to determine a cutoff surface.
> >
> > That cutoff can have any shape that is constrained to the point-group
> > symmetry (plus a centre of symmetry): the term "anisotropic" does not
> > imply that it has to be an ellipsoid - just that it is not constrained
> > to be isotropic (as in a spherical cutoff surface based on per-bin
> > statistics and a single high-resolution value) [1].
> >
> > STARANISO fits an ellipsoid to that generic anisotropic cutoff surface
> > for two main reasons: (1) providing three diffraction limits along the
> > axes of that ellipsoid to have a reasonably simple description of any
> > potential anisotropic diffraction, and (2) computing statistics -
> > especially completeness - to accurately describe the outcome of the
> > diffraction experiment.
> >
> > > In my experience this is good because
> > >
> > >
> > > 1. Crystals generally diffract better in some directions than
> > >    others
> > > 2. The electron density maps look better after STARANISO elliptical
> > >    data truncation, as compared to spherical data truncation in XDS,
> > >    XDSAPP3, DIALS etc. so it becomes easier and faster to build a
> > >    model and refine a structure.
> >
> > One caveat depending on the refinement program/procedure used: the DFc
> > completion used by most (all?) refinement programs will provide purely
> > model-based map coefficients to the electron density map for those
> > reflections missing in the initial list of reflections in the input
> > MTZ file. That is a sensible approach when those reflections are
> > missing because they weren't measured although we assume they should be
> > observable (think: reflections behind beamstop, in detector module
> > gaps or lost due to cusp).
> >
> > It becomes problematic when there is no check against "observability"
> > at the map computation stage and one just uses all reflections with
> > Miller indices in the MTZ file to assign DFc values to the 2mFo-DFc
> > electron density maps when there is no Fo value - e.g. for highly
> > anisotropic data (where a large number of reflections are unobservable
> > in some directions at higher resolution and those are therefore not
> > present in the list of Fo).
> >
> > This is why BUSTER will output three different sets of
> > electron-density map coefficients:
> >
> >   2FOFCWT            / PH2FOCWT              <<< for all Fo
> >   2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill   <<< for all HKL within the
> > cutoff surface, i.e. defined as "observable"
> >   2FOFCWT_iso-fill   / PH2FOCWT_iso-fill     <<< to the highest d* of any
> > Miller index in the input file
> >
> > (Other refinement packages do something similar and provide several
> > map coefficients). As you can see, using the last set (map
> > coefficients in a sphere) can create artificially "nice" maps due to
> > model bias: if you see near perfect density over the full model you
> > might want to check if the correct electron density map is used
> > (2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill is what we would recommend).
> >
> > If you used the STARANISO output file as input into refinement and on
> > output you only get that last set of map coefficients (i.e. all
> > reflections in a sphere to the high resolution limit), you could run
> > something like the following set of commands to fix this:
> >
> >   cad hklin1 staraniso_alldata-unique.mtz \
> >       hklin2 refinement_out.mtz \
> >       hklout tmp.mtz \
> >       <<e
> >   LABI FILE 1 E1=SA_flag
> >   LABI FILE 2 E1=F E2=SIGF E3=FWT E4=PHWT
> >   e
> >
> >   sftools <<e
> >   read tmp.mtz
> >   select col SA_flag > 0
> >   calc F col 2FOFCWT_aniso-fill = col FWT
> >   calc P col PH2FOFCWT_aniso-fill = col PHWT
> >   select all
> >   calc F col 2FOFCWT_iso-fill = col FWT
> >   calc P col PH2FOFCWT_iso-fill = col PHWT
> >   delete col FWT
> >   delete col PHWT
> >   select col F = present
> >   calc F col 2FOFCWT = col 2FOFCWT_iso-fill
> >   calc P col PH2FOFCWT = col PH2FOFCWT_iso-fill
> >   select all
> >   write e-density.mtz
> >   stop
> >   e
> >
> > Now you'll have all three sets of electron-density map coefficients at
> > your disposal.
> >
> > The above steps might be necessary especially when you run refinement
> > through an interface that tries to be helpful and "completes" the
> > incoming MTZ file with test-set flags right to largest d* value of any
> > observed reflection [2]. Those dummy Miller indices (denoting a
> > reciprocal lattice point with no associated data) will receive purely
> > model-based DFc values for electron density map computations.
> >
> > > To my understanding AIMLESS assumes 100 % completeness when creating
> > > its 20 bins, and elliptically truncated data
> >
> > STARANISO doesn't truncate the data ellipsoidally: it uses the
> > anisotropic cutoff surface as described above (the ellipsoid is just a
> > construct to make reporting easier).
> >
> > > has extremely low completeness in the highest resolution shells,
> >
> > True - if we assume that all reciprocal lattice points (Miller
> > indices) within the sphere out to the high-resolution value of the
> > outer shell are potentially observable. But that isn't quite what
> > "completeness" tries to measure: we want to know how many reflections
> > we measured relative to the number of reflections we could've
> > measured. This is something quite different.
> >
> > If we had a crystal with perfect isotropic diffraction we are happy to
> > define some outer limit (i.e. a sphere described by a radius) where we
> > say: "We could have measured all reflections within that sphere and
> > everything outside of that sphere is just not observable (e.g. too
> > weak)". That makes complete sense: we don't want to pretend that every
> > crystal diffracts to 0.5A resolution and therefore we should compute
> > overall completeness always to a high-resolution limit of 0.5A
> > ... that would be silly.
> >
> > The same applies to a crystal with anisotropic diffraction: we need to
> > define a region in reciprocal space where we can say the same thing
> > ("inside is observable and outside is not"). STARANISO provides us
> > that shape as an ellipsoid defined by three axis lengths (diffraction
> > limits). So now we compute the completeness relative to the
> > reflections within that shape - not much different to the concept of a
> > sphere, but more accurate in this case.
> >
> > > so Table 1 is looking very odd with extremely low completeness in
> > > the highest resolution shell when using REFMAC5 after STARANISO.
> >
> > That is why STARANISO (or autoPROC) report the "Completeness
> > (ellipsoidal)" in their Table1 and the deposition-ready mmCIF file [3]
> > - which is what we think should be reported in papers etc.
> >
> > Hope that helps to clarify some of the finer details.
> >
> > Cheers
> >
> > Clemens & Ian
> >
> > [1] https://staraniso.globalphasing.org/anisotropy_about.html
> > [2] https://staraniso.globalphasing.org/test_set_flags_about.html
> > [3] see eg. https://staraniso.globalphasing.org/table1/q6/8q68.html
> >
> > ########################################################################
> >
> > To unsubscribe from the CCP4BB list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> >
> > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
> mailing
> > list hosted by www.jiscmail.ac.uk, terms & conditions are available at
> > https://www.jiscmail.ac.uk/policyandsecurity/
>
> > _______________________________________________
> > proc-develop mailing list
> > [email protected]
> > http://wwi.globalphasing.com/mailman/listinfo/proc-develop
>
>
> --
>
> *--------------------------------------------------------------
> * Clemens Vonrhein, Ph.D.     vonrhein AT GlobalPhasing DOT com
> * Global Phasing Ltd., 9 Journey Campus, Castle Park
> * Cambridge CB3 0AX, UK                   www.globalphasing.com
> *--------------------------------------------------------------
>
>
> ------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] [proc-develop] Fwd: [ccp4bb] How to divide an equal amount of reflections into 20 bins? (like Staraniso does it)

Reply via email to