Dear Ethan,

Thank you for your comments! I started a new thread, it was unfortunate that I 
brought this up in a discussion about B-factors. I really wanted to discuss 
something that is model agnostic and how to represent uncertainty by sampling. 
I consider an ensemble model with multiple partial occupancy molecules is still 
one model. 

I am not sure if it is possible to use MODEL-ENDMDL loops in pdb or mmcif 
format for storing multiple crystallographic models. I assume it is already 
possible to store multiple structure factor files (for refinement, for phasing, 
different crystals etc) under the same entry. In my mind, it would be a small 
step to associate different data sets distinguished by crystal ID or data block 
with a particular model number, but maybe it is not that simple. 

I do not want to create multiple pdb entries just to provide evidence for the 
robustness/reproducibility of crystals and crystallographic models. I would 
rather use different pdb entries for different sampling intentions: for example 
entry 1 contains all the control crystals, entry 2 contains all the crystals 
subjected to treatment A, etc. These would otherwise share identical data 
reduction and refinement protocols and most of the metadata. I am afraid I do 
know how the PDB and associated services work internally, but I hope someone 
here can provide guidance.

Best wishes,

Gergely


Gergely Katona, Professor, Chairman of the Chemistry Program Council
Department of Chemistry and Molecular Biology, University of Gothenburg
Box 462, 40530 Göteborg, Sweden
Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
Web: http://katonalab.eu, Email: gergely.kat...@gu.se

-----Original Message-----
From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> On Behalf Of Ethan A Merritt
Sent: 29 May, 2021 19:16
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] AW: [ccp4bb] AW: [ccp4bb] (R)MS

On Saturday, 29 May 2021 02:12:16 PDT Gergely Katona wrote:
[...snip...]
 I think the assumption of independent variations per atoms is too strong in 
many cases and does not give an accurate picture of uncertainty.
[...snip...]


Gergely, you are revisiting a line of thought that historically led to the 
introduction of more global treatments of atomic displacement.
These have distinct statistical and interpretational advantages.

Several approaches have been tried over the past 40 years or so.
The one that has proved most successful is the use of TLS
(Translation/Libration/Screw) models of bulk displacement to supplement or 
replace per-atom descriptions.  As you say, a per-atom treatment is often too 
strong and is not statistically justified by the experimental data.  I explored 
this with specific examples in

   "To B or not to B?" [Acta Cryst. 2012, D68, 468-477]
    http://skuld.bmsc.washington.edu/~tlsmd/references.html

An NMR-style approach that constructs and refines multiple discrete models has 
been been re-invented several times. These treatments are generally called 
"ensemble models".  IMHO they are statistically unjustified and strictly worse 
than treatments based on higher level descriptions such as TLS or normal-mode 
analysis.
X-ray data is qualitatively different from NMR data, and optimal treatment of 
uncertainty must take this into account.

        best regards

                Ethan


> Hi,
> 
> It is enough to have Ų as unit to express uncertainty in 3D, but one can 
> express it with a single number only in a very specific case when the atom is 
> isotropic. Few atoms have a naturally isotropic distribution around their 
> mean position in very high resolution protein crystal structures. The 
> anisotropic atoms can be described by a 3x3 matrix, where each row and column 
> is associated with the uncertainty in a specific spatial direction. The 
> matrix elements are the product of the uncertainty in these directions. The 
> diagonal elements will be the square of uncertainty in the same direction and 
> they should be always positive, the off-diagonal combination of directions 
> are covariances (+,0 or -). In the end, every element will have a unit 
> distance*distance and the matrix will be symmetric. We cannot just take the 
> square root of the matrix elements and expect something meaningful, if for no 
> other reason the problem with negative covariances. To calculate the square 
> root on the matrix itself one has to diagonalize it first. The height of a 
> person in your example  sounds easy to define, but the mathematical formalism 
> will not decide that for me. I can also define height as the longest cord of 
> a person or the maximum elevation of a car mechanic under a car.  Through 
> diagonalization one can at least extract some interesting, intuitive, 
> principal directions. The final product, the sqrt(matrix), is not more 
> intuitive to me. To convert it to something intuitive I would have to 
> diagonalize square rooted matrix again. So shall we make an exception for the 
> special, isotropic description? Or use general principles for isotropic and 
> anisotropic treatments?
> 
> About what B-factors are, I like to think about them as necessary model 
> parameters. Computational biologists also use them for benchmarking their 
> molecular dynamics models. They are also reproducible to the extent that one 
> can identify specific atoms just based on their anisotropic tensor from 
> independent structure determinations in the same crystal form. They are of 
> course not immune to errors and variation.
> 
> I also wonder how we can represent model parameter variation in the best way. 
> I admire NMR spectroscopists’ approach to deposit multiple samples from a 
> structural distribution. One could reproduce their conclusions without 
> assuming any sort of error model from these samples. In crystallography, we 
> have more and more distributions to deal with because we are swimming in 
> data. It is easy to sample/resample data sets from the same or different 
> crystals (SFX for example). Which can lead to many replicates of structural 
> models. I cannot really motivate to create multiple PDB entries for these 
> replicates, it is not good for to reader to try to understand which PDB codes 
> belong to which group of samples. Maybe it works for up to 10 structures, but 
> how about a 100? Is it possible to deposit crystal structures as a chain of 
> model/data pairs under the same entry? It is possible to just make a tarball 
> and deposit in alternative services such as Zenodo, but it would be a pity to 
> completely bypass the PDB. I can think of more compact description of 
> structural distributions, for example mean positions and mean B-factors of 
> atoms with their associated covariance matrices, analogously how MD 
> trajectories can be described as average structures and covariance matrices.  
> I think the assumption of independent variations per atoms is too strong in 
> many cases and does not give an accurate picture of uncertainty.
> 
> Best wishes,
> 
> Gergely
> 
> Gergely Katona, Professor, Chairman of the Chemistry Program Council 
> Department of Chemistry and Molecular Biology, University of 
> Gothenburg Box 462, 40530 Göteborg, Sweden
> Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
> Web: http://katonalab.eu, Email: gergely.kat...@gu.se
> 
> From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> On Behalf Of Hughes, 
> Jonathan
> Sent: 28 May, 2021 14:49
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: [ccp4bb] AW: [ccp4bb] AW: [ccp4bb] AW: [ccp4bb] (R)MS
> 
> hi ian,
> yes, that aspect was in my mind, a bit, but i wanted to keep it simple. my 
> point wasn't really how the "uncertainty" parameter is derived but rather its 
> units. i can imagine that uncertainty in 3D could be expressed in ų (without 
> helping the naïve user much) or in Å (which to me at least seems useful), but 
> Ų (i.e. the B factor) seems neither logical nor helpful in this context, 
> irrespective of its utility elsewhere. if you just see the B factor as a 
> number, ok, you can do the √ in your head, but if it's visualized as in 
> pymol/putty larger uncertainties become exaggerated – which is another word 
> for "misrepresented".
> cheers
> j
> 
> Von: Ian Tickle <ianj...@gmail.com<mailto:ianj...@gmail.com>>
> Gesendet: Freitag, 28. Mai 2021 12:10
> An: Hughes, Jonathan 
> <jon.hug...@bot3.bio.uni-giessen.de<mailto:jon.hug...@bot3.bio.uni-gie
> ssen.de>>
> Cc: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
> Betreff: Re: [ccp4bb] AW: [ccp4bb] AW: [ccp4bb] (R)MS
> 
> 
> Hi Jonathan
> 
> On Thu, 27 May 2021 at 18:34, Hughes, Jonathan 
> <jon.hug...@bot3.bio.uni-giessen.de<mailto:jon.hug...@bot3.bio.uni-giessen.de>>
>  wrote:
> 
>  "B = 8π2<u2>  where u is the r.m.s. displacement of a scattering center, and 
> <...> denotes time averaging"
> 
> Neither of those statements is necessarily correct: u is the _instantaneous_ 
> displacement which of course is constantly changing (on a timescale of the 
> order of femtoseconds) and cannot be measured.  So u2 is the squared 
> instantaneous displacement, <u2>  is the mean-squared displacement, and so 
> the root-mean-squared displacement (which of course is amenable to 
> measurement) is sqrt(<u2>), not the same thing at all as u.
> 
> Incidentally, the 8π2 constant factor comes from Fourier-transforming the 
> Debye-Waller factor expression I mentioned earlier.
> 
> Also for crystals at least, the averaging is not only over time, it's over 
> all unit cells, i.e. the displacements are not only thermal in origin but 
> also due to spatial static disorder (instantaneous differences between unit 
> cells).
> 
> 
> it would seem to me that we would be able to interpret things MUCH more 
> easily with u rather than anything derived from u².
> So then I think what you mean is sqrt(<u2>) rather than <u2>, which seems not 
> unreasonable.
> 
> Cheers
> 
> -- Ian
> 
> 
> 
> 
> 
> ________________________________
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 
> ######################################################################
> ##
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
> available at https://www.jiscmail.ac.uk/policyandsecurity/
> 


--
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to