Re: [ccp4bb] PAIREF, Anisotropy and STARANISO

Robbie Joosten Thu, 06 Oct 2022 06:20:21 -0700

The combination of paired refinement and anisotropy should not be a problem, 
but I think there are a few catches depending on the implementation. I'll 
reason from the PDB-REDO implementation of paired refinement which was made 
assuming a "you get what you get" set of reflections without the option of 
reprocessing. This can include datasets that are severely anisotropic with 
isotropic cut-off at the highest resolution direction, datasets in which the 
detector was too close so there are missing reflections in a systematic (but 
not ellipsoidal, right?) way, and sets with missing wedges, shells, or random 
sets (e.g. the test set) of reflections that could have been observed/deposited.


To me, paired refinement is a very elegant way to find out whether an 
additional set of (higher resolution) reflections caries information that a 
refinement program can use to improve its results. Its introduction (thanks, 
Kay) is a big step forward on the eternal discussions on resolution cut-offs. 
There need not be any assumption of (an)isotropy at this stage. In the pdb-redo 
implementation we do make another assumption: the higher the resolution, the 
noisier the data gets. That is, that usable information per reflection 
gradually goes down with resolution. 

This has a consequence for how to chose your resolution steps in refinement 
(IMO). To make sure the useful information content per step goes down, you 
shouldn't use equal steps in resolution or any other type of binning that 
results in unequal numbers of reflections per step. Instead taking steps of 
equal numbers of reflections assures that the steps become gradually worse in 
terms of usability. The added bonus is that steps of equal numbers of 
reflections also ensure that each step has a very similar number of test set 
reflections which are needed for establishing the final resolution cut-off. The 
only real exception I can think of is when one step happens to have a serious, 
but untreated, ice ring and the next one doesn't. 

In practice, this approach works well and we see that paired refinement is a 
very popular option in pdb-redo (10% of the calculations include paired 
refinement at request of the users). It has been used a lot to 'shut up referee 
#2' at the review stage, but also in the early stages of model building. I 
really recommend against that as refinement is just less stable at that stage 
and the distinction between resolutions steps is small at best. Anyway, I think 
there is little to no need to make assumptions about data (an)isotropy in 
paired refinement, but one could make a case for a STARANISO-like approach or 
any other anisotropic selection of 'observed' reflections. You 'poison' your 
resolution steps less with reflections that carry only noise. Whether this 
affects the chosen resolution cut-off in any meaningful way is yet to be 
determined (happy to help with that). That said, I do agree with the previous 
posts that you shouldn't treat your data in any other way than selection.

Cheers,
Robbie

> -----Original Message-----
> From: CCP4 bulletin board <[email protected]> On Behalf Of Kay
> Diederichs
> Sent: Thursday, October 6, 2022 12:33
> To: [email protected]
> Subject: Re: [ccp4bb] PAIREF, Anisotropy and STARANISO
> 
> Dear Gerard,
> 
> I'm not going to comment on what others said in this (new) thread; just trying
> to make a few remarks about what you write below -
> 
> On Tue, 4 Oct 2022 17:01:10 +0100, Gerard Bricogne
> <[email protected]> wrote:
> 
> >Dear all,
> >
> >     First of all, apologies for breaking the threads entitled "PAIREF -
> >Warning - not enough free reflections in resolution bin" and "Anisotropy" by
> >merging them into a new one, but it somehow felt rather against nature to
> >keep them separate.
> >
> >     Since the early days of the availability of STARANISO [1] (the actual
> >starting year for the Web server [2] was 2016), we had a hunch that much of
> >what was happening in the PAIREF procedure might simply be the detection of
> >the existence of significant data beyond an initially chosen resolution
> >cut-off not only as a result of an excessively conservative criterion having
> >been applied in that initial choice, but as a consequence of anisotropy in
> >the data.
> 
> Why "much of what was happening ... as a consequence of anisotropy"? These
> words imply that datasets where PAIREF indicates "existence of significant 
> data
> beyond an initially chosen resolution cut-off" (EOSDBAICRC) are anisotropic,
> but that is a) not the case, because PAIREF - or paired refinement in general 
> - in
> my experience, and that of others, often indicates EOSDBAICRC also for
> isotropic data, b) this depends on the initial cutoff. So your general 
> statement
> (or hunch?) cannot be correct.
> 
> > The latter would give rise to different diffraction limits in
> >different directions, and the choice of a single value for "the resolution"
> >at which the data were cut off would necessarily yield a compromise value
> >between the best and the worse diffraction limits. This would imply that
> >significant data would be excluded in the best diffracting directions, that
> >would subsequently drive PAIREF towards increasing the estimated resolution
> >compared to its compromise value.
> >
> 
> In its current implementation, PAIREF tries to determine the isotropic
> resolution cutoff that gives the best model based on valid comparisons of
> (mainly) Rfree values (it also gives other information to the user).
> This is the correct thing to do for isotropic data, and still useful for 
> moderately
> anisotropic data, but clearly there is room for improvement, e.g. by using an
> anisotropic high-resolution cutoff, or by using data from STARANISO, or ...
> We (the authors of the PAIREF paper) have been discussing the treatment of
> anisotropy in the past, but we were under the impression that there is not an
> obvious single best way to deal with anisotropy.
> 
> >     This "hunch" was validated by a detailed comparison carried out on the
> >exact same examples that are considered in the 2020 paper by Maly et al.,
> >that is summarised in the attached PDF. In other words, whenever anisotropy
> >is present in the data, PAIREF will tend to indicate a higher value for an
> >isotropic cut-off than would have been estimated for the initial dataset.
> 
> based on what?? Different people employ different initial resolution cut-offs,
> based on their prior experience.
> Your general statement above assumes a certain decision mode that I'd say is
> not universally valid.
> 
> >The problem with taking the PAIREF result as the final answer is that the
> >higher cut-off it indicates is applied *isotropically*. The inclusion of the
> >significant data thus reclaimed is therefore unavoidably accompanied by that
> >of noisy data in the worst diffracting direction(s), resulting in alarmingly
> >poor statistics in the outermost shell (as pointed out in Eleanor's message)
> >that may cast doubts on the usefulness of the procedure.
> 
> To my understanding, Eleanor's message was not about PAIREF, but you cite it
> as if it were. I don't like this.
> 
> > This consideration
> >was the basis of the rationale for implementing an *anisotropic* cut-off
> >surface in STARANISO, so that one could thus reclaim the significant data in
> >the best-diffracting direction(s) while avoiding the simultaneous inclusion
> >of the pure-noise measurements in the worse one(s). While this is clearly
> >and extensively explained in the documentation provided on the STARANISO
> >server [2], it seems to be far from having been assimilated. Of course this
> >would be perfect material for a publication, but life is somehow too short,
> >and our to-do list has remained too long, to leave us room for spending the
> >necessary time to go through the process of putting a paper together. The
> >truly important matter is to get our picture in front of the user community.
> 
> It would actually be good to have a proper paper!
> 
> STARANISO is a very valuable program. I do use it a lot, and have seen great
> improvements in maps. But there are open questions.
> First, there is always a danger associated with modifying experimental data, 
> so
> I'm not sure I like the default of STARANISO that leads to an up-scaling of 
> data
> along the weak direction(s). I'd rather see this up-scaling implemented in the
> refinement program(s) which write out the coefficients for map calculation.
> Second, (from the POV of Randy Read not an open question IIUC) STARANISO
> data should not be used for MR in Phaser.
> Third, I'd like to know if substructure solution works better with data from
> STARANISO than with the original data.
> Fourth, to me a (STARANISO default) cutoff of I/sigI at 1.2 is arbitrary. Yes 
> I
> know I can modify it, but given that the STARANISO calculation is not
> instantaneous, I'd rather have a cutoff that is variable, and is optimized 
> for the
> given data and model - exactly what PAIREF does. Also, the sigI values are not
> very reproducible across different data processing programs.
> 
> >
> >     Now that the combined topics of PAIREF and anisotropy are being brought
> >to the foreground of the community's attention, this seems like the perfect
> >opportunity to present our analysis and position: what PAIREF achieves in
> >terms of an upward revision of an initial isotropic resolution cut-off is
> >likely to be achieved more straightforwardly by submitting the same data to
> >the STARANISO server (or using it within autoPROC [3]); and the STARANISO
> >output will have the advantage of being devoid of the large extra amount of
> >purely noisy, uninformative data that are retained in the output from PAIREF
> >according to its revised isotropic cut-off.
> 
> By saying so, you imply that the default cutoff that STARANISO uses gives the
> best results. I don't agree,
> for the same reasons that apply to the choice of high-resolution cutoffs for
> isotropic data - any fixed cutoff based on some indicator is arbitrary (why 
> is a
> I/sigI cutoff of 1.2 better than 1.1 or 1.3 or ...? is there a proof?); the 
> cutoff
> must depend on the model (a bad model does not benefit from weak data); the
> cutoff must also depend on the refinement program - e.g. phenix.refine does
> not take the sigI into account. Paired refinement would be a better way
> because it informs the user about the consequences (on the model and its R-
> values in a fair comparison) of a certain cutoff - the cutoff does not have 
> to be
> based on resolution, but could be based on local I/sigI or the like.
> 
> >
> >     We would very much welcome feedback on this position: indeed we would
> >like to *crowd-source* the validation (or refutation) of this conclusion. In
> >our view, continuing to use the PAIREF procedure to revise an isotropic
> >resolution cut off misses the point about the consequences of anisotropy.
> 
> Here too you imply that all datasets are anisotropic.
> 
> >The only sensible use of a PAIREF-like procedure would be to adjust the
> >cut-off threshold for the local average of I/sig(I) in STARANISO, whose
> >default value is currently 1.2 but can be reset by the user through the Web
> >server's GUI. We occasionally see datasets of very high quality for which
> >the CC_1/2 value in the outermost shell stays above 0.6 or even 0.7, and it
> >is quite plausible that further useful data could be rescued if the local
> >I/sig(I) cut-off threshold were lowered below 1.2.
> 
> The way you phrase it appears to diminish the value of a PAIREF-like 
> procedure.
> To the contrary, I'd think it would be valuable and I'd like to see exactly 
> such a
> procedure.
> 
> >
> >     Concerning Eleanor's view that noisy data can't hurt refinement because
> >they are properly down-weighted by the consideration of e.g. Rfree values in
> >resolution shells, we would point out that any criterion based on statistics
> >in resolution shells will be polluted if the data are anisotropic and if the
> >noisy data that STARANISO would reject are retained. That will result in
> >excessive down-weighting of the significant data that STARANISO retains,
> >hence in losing the information they contain. Perhaps this is a matter for
> >later discussion, but the main idea is that retaining pure-noise data is not
> >neutral in refinement, and that every "isotropic thinking habit" on which
> >many views are based needs to be revisited.
> 
> My view here is that the existence of a "best" resolution cutoff (e.g. as a
> minimum in Rfree) that we often see in paired refinement appears to prove
> that the inclusion of data beyond that limit is somewhat detrimental to the
> model. Meaning that inclusion of noise is not recommended - and emphasizing
> the value of cutting the data in a smart(er) way.
> 
> To summarize what I want to say: a) I don't find your assessment of the merits
> of PAIREF to be balanced. b) I think it would be worthwhile to optimize the
> data cutoff based on local I/sigI or similar - so I'd wish there were a
> combination of STARANISO and PAIREF (which you seem to see as non-
> equitable alternatives).
> 
> One more word: sorry, I don't have the time currently to continue this thread
> from my side.
> 
> Best wishes,
> Kay
> 
> >
> >
> >     With best wishes,
> >
> >Clemens, Claus, Ian and Gerard.
> >
> >
> >[1] Tickle, I.J., Flensburg, C., Keller, P., Paciorek, W., Sharff, A.,
> >    Vonrhein, C., Bricogne, G. (2018). STARANISO. Cambridge, United
> >    Kingdom: Global Phasing Ltd.
> >    https://www.jiscmail.ac.uk/cgi-bin/wa-
> jisc.exe?A2=ind1806&L=CCP4BB&O=D&P=3971
> >
> >[2] https://staraniso.globalphasing.org/
> >
> >[3] https://doi.org/10.1107/s0907444911007773
> >    https://www.globalphasing.com/autoproc/
> >
> >
> >##################################################################
> ######
> >
> >To unsubscribe from the CCP4BB list, click the following link:
> >https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> >
> >This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at
> https://www.jiscmail.ac.uk/policyandsecurity/
> >
> 
> ###################################################################
> #####
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at
> https://www.jiscmail.ac.uk/policyandsecurity/

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] PAIREF, Anisotropy and STARANISO

Reply via email to