Hi Randy Thanks, that's a neat tip, I'll give it a try.
Cheers -- Ian On Thu, 4 Dec 2025, 03:02 Randy John Read, <[email protected]> wrote: > Dear Ian, > > Those are good points, but it’s worth noting that in the plots that Phaser > produces for the observed vs theoretical second moments, the effects of the > intensity standard deviations are taken into account in the theoretical > curve. Seeing that the curves match serves as evidence both that the > crystal is untwinned and that the standard deviations have been estimated > appropriately. > > Best wishes, > > Randy > > ----- > Randy J. Read > Department of Haematology, University of Cambridge > Cambridge Institute for Medical Research Tel: +44 1223 336500 > The Keith Peters Building > Hills Road E-mail: > [email protected] > Cambridge CB2 0XY, U.K. > www-structmed.cimr.cam.ac.uk > > *From: *CCP4 bulletin board <[email protected]> on behalf of Ian > Tickle <[email protected]> > *Date: *Wednesday, 3 December 2025 at 17:05 > *To: *[email protected] <[email protected]> > *Subject: *Re: [ccp4bb] [proc-develop] Fwd: [ccp4bb] How to divide an > equal amount of reflections into 20 bins? (like Staraniso does it) > > > Dear All > > I thought it worth adding for completeness a further point concerning > anisotropic cutoffs to Clemens's excellent explanation. I have come across > claims that an anisotropic cutoff distorts the intensity statistics. I > will argue and demonstrate that actually the exact opposite is true: NOT > using an anisotropic cutoff can bias the moments away from their expected > values based on error-free data. > > For the error-free acentric Wilson distribution it's easy to show that the > 2nd raw moment E(Z^2) of the normalised intensity Z = I / E(I) is 2. The > r'th raw moment of a random variable x is the statistical expectation (mean > value) of the r'th power of x (x^r: r need not be an integer). A > "raw moment" is one where the deviations are relative to zero as opposed to > a "central moment" where deviations are relative to the mean (the mean is > the 1st raw moment, the variance is the 2nd central moment, the skewness is > the 3rd central moment etc.). Moments have an important practical > application in the detection of twinning. > > In reality there will always be experimental errors and if we assume the > usual approximation of Gaussian random errors in the intensities, one can > show that the 2nd raw moment of the normalised intensity for the acentric > Wilson distribution is: > > E(Z^2) = 2 + <sigma(I)^2> / S^2 > > where S is the Wilson distribution parameter. This means that for weak > data where sigma(I) < S, the errors have the effect of inflating the > moments (one can show by generalising the above expression that the effect > is even greater for higher order moments r > 2). Since the effect of > twinning is to reduce all the moments with r > 1 (i.e. the twinned > intensity distribution has fewer extreme values compared with the untwinned > case), the effect of random errors will be to mask the effects of twinning > in the tests. > > For my test data (courtesy of Eleanor), STARANISO gives anisotropic > diffraction limits (1.96, 1.62, 1.40) so that for reflections at lower > resolution than the lowest d* (1/1.96) the data are almost complete and the > effect of weak data on the moments is negligible. Between the lower and > upper limits (d* = 1/1.96 to 1/1.40) the proportion of weak data inside a > sphere will progressively increase causing the moments to increase > according to the equation above. Beyond the bounding sphere of radius d* = > 1/1.4 where all the data are weak, the moments tend to infinity as S > approaches zero, so one definitely does not want to include those, but by > the same argument one does not want to include weak data between the > anisotropic cutoff surface and the bounding sphere either. This is where > the anisotropic cutoff comes in. The usual isotropic completeness will > progressively decrease between these limits, whereas the more relevant > anisotropic completeness (defined as the proportion of possible observed > reflections that are actually observed) will remain high, and the moments > will remain at their expected values for error-free data. It is not > possible for weak data to be observed (by definition) so they should not be > counted as 'possibles'. > > The attached plot demonstrates this for the test data (acentric > reflections only). This was run first with CTRUNCATE which applies an > anisotropy correction but not an anisotropic cutoff. CTRUNCATE outputs the > same reflections used for the plot to its output MTZ file. Then the same > data was run with STARANISO and the output of that put through CTRUNCATE > just to get comparable plots (in that case CTRUNCATE would not apply an > anisotropy correction because the data have already been > anisotropy-corrected by STARANISO). STARANISO only outputs the reflections > used in the plot. > > One can see exactly the effects described above in the plots. There are > progressively fewer reflections in the range d* = 1/1.96 to 1/1.4 in the > STARANISO plot because they have been cut by the anisotropic cutoff leaving > only the "good stuff". Note that CTRUNCATE mislabels the plots as moments > of I (<I^2>) whereas they are plainly moments of Z (<Z^2> = <I^2> / <I>^2). > > One final point: the equation shows that tests should always be performed > on expectations NOT on measured values: this is very important. STARANISO > goes to considerable lengths to compute expectations, typically using local > averages or global solid harmonics function fitting. So a reflection is > classed as "weak" only when the local average or interpolated mean > I/sigma(I) of neighbouring reflections is low (usually < 1.2), not > necessarily when the individual measured I/sigma(I) is low. Some legacy > programs apply cutoffs using the measured I/sigma(I) or F/sigma(F) as a > criterion: never a good idea! > > Cheers > > -- Ian > (Global Phasing, Cambridge, UK). > > > > > ---------- Forwarded message --------- > > From: Clemens Vonrhein <[email protected]> > > Date: Mon, 24 Nov 2025 at 18:58 > > Subject: Re: [ccp4bb] How to divide an equal amount of reflections into > 20 > > bins? (like Staraniso does it) > > To: <[email protected]> > > > > > > Dear all, > > > > Just a few technical points and clarifications ... to avoid > > unneceessary confusion in the future (CCP4bb postings can have a long > > shelf life) :-) > > > > On Tue, Nov 18, 2025 at 06:55:15PM +0000, Martin Moche wrote: > > > As we all know STARANISO selects the best ellipse, instead of > > > sphere, when integrating reflections. > > > > STARANISO does not do any integration: that is part of the actual > > integration program reading the raw diffraction images and turning > > them into a list of unscaled+unmerged intensities (and sigmas). > > > > The next step then is to determine appropriate scale/correction > > factors to turn those into a set of scaled+unmerged > > intensities. Those could be fed into STARANISO, but more typically it > > uses the scaled+merged (using inverse-variance weighting) intensities > > to determine a cutoff surface. > > > > That cutoff can have any shape that is constrained to the point-group > > symmetry (plus a centre of symmetry): the term "anisotropic" does not > > imply that it has to be an ellipsoid - just that it is not constrained > > to be isotropic (as in a spherical cutoff surface based on per-bin > > statistics and a single high-resolution value) [1]. > > > > STARANISO fits an ellipsoid to that generic anisotropic cutoff surface > > for two main reasons: (1) providing three diffraction limits along the > > axes of that ellipsoid to have a reasonably simple description of any > > potential anisotropic diffraction, and (2) computing statistics - > > especially completeness - to accurately describe the outcome of the > > diffraction experiment. > > > > > In my experience this is good because > > > > > > > > > 1. Crystals generally diffract better in some directions than > > > others > > > 2. The electron density maps look better after STARANISO elliptical > > > data truncation, as compared to spherical data truncation in XDS, > > > XDSAPP3, DIALS etc. so it becomes easier and faster to build a > > > model and refine a structure. > > > > One caveat depending on the refinement program/procedure used: the DFc > > completion used by most (all?) refinement programs will provide purely > > model-based map coefficients to the electron density map for those > > reflections missing in the initial list of reflections in the input > > MTZ file. That is a sensible approach when those reflections are > > missing because they weren't measured although we assume they should be > > observable (think: reflections behind beamstop, in detector module > > gaps or lost due to cusp). > > > > It becomes problematic when there is no check against "observability" > > at the map computation stage and one just uses all reflections with > > Miller indices in the MTZ file to assign DFc values to the 2mFo-DFc > > electron density maps when there is no Fo value - e.g. for highly > > anisotropic data (where a large number of reflections are unobservable > > in some directions at higher resolution and those are therefore not > > present in the list of Fo). > > > > This is why BUSTER will output three different sets of > > electron-density map coefficients: > > > > 2FOFCWT / PH2FOCWT <<< for all Fo > > 2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill <<< for all HKL within the > > cutoff surface, i.e. defined as "observable" > > 2FOFCWT_iso-fill / PH2FOCWT_iso-fill <<< to the highest d* of any > > Miller index in the input file > > > > (Other refinement packages do something similar and provide several > > map coefficients). As you can see, using the last set (map > > coefficients in a sphere) can create artificially "nice" maps due to > > model bias: if you see near perfect density over the full model you > > might want to check if the correct electron density map is used > > (2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill is what we would recommend). > > > > If you used the STARANISO output file as input into refinement and on > > output you only get that last set of map coefficients (i.e. all > > reflections in a sphere to the high resolution limit), you could run > > something like the following set of commands to fix this: > > > > cad hklin1 staraniso_alldata-unique.mtz \ > > hklin2 refinement_out.mtz \ > > hklout tmp.mtz \ > > <<e > > LABI FILE 1 E1=SA_flag > > LABI FILE 2 E1=F E2=SIGF E3=FWT E4=PHWT > > e > > > > sftools <<e > > read tmp.mtz > > select col SA_flag > 0 > > calc F col 2FOFCWT_aniso-fill = col FWT > > calc P col PH2FOFCWT_aniso-fill = col PHWT > > select all > > calc F col 2FOFCWT_iso-fill = col FWT > > calc P col PH2FOFCWT_iso-fill = col PHWT > > delete col FWT > > delete col PHWT > > select col F = present > > calc F col 2FOFCWT = col 2FOFCWT_iso-fill > > calc P col PH2FOFCWT = col PH2FOFCWT_iso-fill > > select all > > write e-density.mtz > > stop > > e > > > > Now you'll have all three sets of electron-density map coefficients at > > your disposal. > > > > The above steps might be necessary especially when you run refinement > > through an interface that tries to be helpful and "completes" the > > incoming MTZ file with test-set flags right to largest d* value of any > > observed reflection [2]. Those dummy Miller indices (denoting a > > reciprocal lattice point with no associated data) will receive purely > > model-based DFc values for electron density map computations. > > > > > To my understanding AIMLESS assumes 100 % completeness when creating > > > its 20 bins, and elliptically truncated data > > > > STARANISO doesn't truncate the data ellipsoidally: it uses the > > anisotropic cutoff surface as described above (the ellipsoid is just a > > construct to make reporting easier). > > > > > has extremely low completeness in the highest resolution shells, > > > > True - if we assume that all reciprocal lattice points (Miller > > indices) within the sphere out to the high-resolution value of the > > outer shell are potentially observable. But that isn't quite what > > "completeness" tries to measure: we want to know how many reflections > > we measured relative to the number of reflections we could've > > measured. This is something quite different. > > > > If we had a crystal with perfect isotropic diffraction we are happy to > > define some outer limit (i.e. a sphere described by a radius) where we > > say: "We could have measured all reflections within that sphere and > > everything outside of that sphere is just not observable (e.g. too > > weak)". That makes complete sense: we don't want to pretend that every > > crystal diffracts to 0.5A resolution and therefore we should compute > > overall completeness always to a high-resolution limit of 0.5A > > ... that would be silly. > > > > The same applies to a crystal with anisotropic diffraction: we need to > > define a region in reciprocal space where we can say the same thing > > ("inside is observable and outside is not"). STARANISO provides us > > that shape as an ellipsoid defined by three axis lengths (diffraction > > limits). So now we compute the completeness relative to the > > reflections within that shape - not much different to the concept of a > > sphere, but more accurate in this case. > > > > > so Table 1 is looking very odd with extremely low completeness in > > > the highest resolution shell when using REFMAC5 after STARANISO. > > > > That is why STARANISO (or autoPROC) report the "Completeness > > (ellipsoidal)" in their Table1 and the deposition-ready mmCIF file [3] > > - which is what we think should be reported in papers etc. > > > > Hope that helps to clarify some of the finer details. > > > > Cheers > > > > Clemens & Ian > > > > [1] https://staraniso.globalphasing.org/anisotropy_about.html > > [2] https://staraniso.globalphasing.org/test_set_flags_about.html > > [3] see eg. https://staraniso.globalphasing.org/table1/q6/8q68.html > > > > ######################################################################## > > > > To unsubscribe from the CCP4BB list, click the following link: > > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a > mailing > > list hosted by www.jiscmail.ac.uk, terms & conditions are available at > > https://www.jiscmail.ac.uk/policyandsecurity/ > > > _______________________________________________ > > proc-develop mailing list > > [email protected] > > http://wwi.globalphasing.com/mailman/listinfo/proc-develop > > > -- > > *-------------------------------------------------------------- > * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com > * Global Phasing Ltd., 9 Journey Campus, Castle Park > * Cambridge CB3 0AX, UK www.globalphasing.com > *-------------------------------------------------------------- > > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
