On Monday, 31 May 2021 10:53:46 PDT Gergely Katona wrote:
> Dear Ethan,
> 
> Thank you for your comments! I started a new thread, it was unfortunate that 
> I brought this up in a discussion about B-factors. I really wanted to discuss 
> something that is model agnostic and how to represent uncertainty by 
> sampling. I consider an ensemble model with multiple partial occupancy 
> molecules is still one model. 

Gergely,

Your questions touch on topics that are far too broad to address
satisfactorily on the bulletin board.  I will offer a few thoughts.

- The PDB format has supposedly been deprecated in favor of mmcif,
but let's disregard that.

- A PDB file can contain any number of models. Each is introduced
by a record with "MODEL " in columns 1-6.  The documentation said
  "The MODEL record specifies the model serial number when multiple
   structures are presented in a single coordinate entry, as is often
   the case with structures determined by NMR."
Note that it mentions NMR as an example but does not limit use
of multiple model sections to NMR experiments.

- The PDB format allows [requires?] a header record with with
"EXPDTA" in columns 1-6.  This is used to identify whether the 
model coordinates in the file are supported by X-ray data, NMR data,
theoretical calculation, fiber diffraction, etc.
I don't know how long the list grew to be.

In the context of your question, this EXPDTA information is important.
For example my earlier comment that ensemble models are not
statistically justified was specifically with regard to modeling
X-ray crystal diffraction data.  Generating an ensemble to describe,
say, snapshots of an MD simulation is an entirely different story.

- "Uncertainly" is pretty vague.
Just sticking with crystal structures, it could mean.

1) This is definitely the mean position for this atom in this
crystal but there is uncertainty in how much individual instances
in different crystal unit cells within the lattice deviate from
this mean.

2) This is a best-effort description of the position of a ligand
atom.  However it is uncertain what fraction of the unit cells
contain the ligand at this position, or at all.

3) It is likely that this sidechain/loop/subunit is present in
different conformations in different copies of the unit cell.

4) The coordinates of this specific atom/residue/conformation
are well supported by the data for this particular crystal.
But it might be somewhere else in the next crystal from the
same crystallization drop, or in a crystal from a different
crystallization buffer, or at another temperature, or in
solution, or in the presence of a ligand, etc.
 
        best

                Ethan


> I am not sure if it is possible to use MODEL-ENDMDL loops in pdb or mmcif 
> format for storing multiple crystallographic models. I assume it is already 
> possible to store multiple structure factor files (for refinement, for 
> phasing, different crystals etc) under the same entry. In my mind, it would 
> be a small step to associate different data sets distinguished by crystal ID 
> or data block with a particular model number, but maybe it is not that 
> simple. 
> 
> I do not want to create multiple pdb entries just to provide evidence for the 
> robustness/reproducibility of crystals and crystallographic models. I would 
> rather use different pdb entries for different sampling intentions: for 
> example entry 1 contains all the control crystals, entry 2 contains all the 
> crystals subjected to treatment A, etc. These would otherwise share identical 
> data reduction and refinement protocols and most of the metadata. I am afraid 
> I do know how the PDB and associated services work internally, but I hope 
> someone here can provide guidance.
> 
> Best wishes,
> 
> Gergely
> 
> 
> Gergely Katona, Professor, Chairman of the Chemistry Program Council
> Department of Chemistry and Molecular Biology, University of Gothenburg
> Box 462, 40530 Göteborg, Sweden
> Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
> Web: http://katonalab.eu, Email: gergely.kat...@gu.se
> 
> -----Original Message-----
> From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> On Behalf Of Ethan A Merritt
> Sent: 29 May, 2021 19:16
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] AW: [ccp4bb] AW: [ccp4bb] (R)MS
> 
> On Saturday, 29 May 2021 02:12:16 PDT Gergely Katona wrote:
> [...snip...]
>  I think the assumption of independent variations per atoms is too strong in 
> many cases and does not give an accurate picture of uncertainty.
> [...snip...]
> 
> 
> Gergely, you are revisiting a line of thought that historically led to the 
> introduction of more global treatments of atomic displacement.
> These have distinct statistical and interpretational advantages.
> 
> Several approaches have been tried over the past 40 years or so.
> The one that has proved most successful is the use of TLS
> (Translation/Libration/Screw) models of bulk displacement to supplement or 
> replace per-atom descriptions.  As you say, a per-atom treatment is often too 
> strong and is not statistically justified by the experimental data.  I 
> explored this with specific examples in
> 
>    "To B or not to B?" [Acta Cryst. 2012, D68, 468-477]
>     http://skuld.bmsc.washington.edu/~tlsmd/references.html
> 
> An NMR-style approach that constructs and refines multiple discrete models 
> has been been re-invented several times. These treatments are generally 
> called "ensemble models".  IMHO they are statistically unjustified and 
> strictly worse than treatments based on higher level descriptions such as TLS 
> or normal-mode analysis.
> X-ray data is qualitatively different from NMR data, and optimal treatment of 
> uncertainty must take this into account.
> 
>       best regards
> 
>               Ethan
> 
> 
> > Hi,
> > 
> > It is enough to have Ų as unit to express uncertainty in 3D, but one can 
> > express it with a single number only in a very specific case when the atom 
> > is isotropic. Few atoms have a naturally isotropic distribution around 
> > their mean position in very high resolution protein crystal structures. The 
> > anisotropic atoms can be described by a 3x3 matrix, where each row and 
> > column is associated with the uncertainty in a specific spatial direction. 
> > The matrix elements are the product of the uncertainty in these directions. 
> > The diagonal elements will be the square of uncertainty in the same 
> > direction and they should be always positive, the off-diagonal combination 
> > of directions are covariances (+,0 or -). In the end, every element will 
> > have a unit distance*distance and the matrix will be symmetric. We cannot 
> > just take the square root of the matrix elements and expect something 
> > meaningful, if for no other reason the problem with negative covariances. 
> > To calculate the square root on the matrix itself one has to diagonalize it 
> > first. The height of a person in your example  sounds easy to define, but 
> > the mathematical formalism will not decide that for me. I can also define 
> > height as the longest cord of a person or the maximum elevation of a car 
> > mechanic under a car.  Through diagonalization one can at least extract 
> > some interesting, intuitive, principal directions. The final product, the 
> > sqrt(matrix), is not more intuitive to me. To convert it to something 
> > intuitive I would have to diagonalize square rooted matrix again. So shall 
> > we make an exception for the special, isotropic description? Or use general 
> > principles for isotropic and anisotropic treatments?
> > 
> > About what B-factors are, I like to think about them as necessary model 
> > parameters. Computational biologists also use them for benchmarking their 
> > molecular dynamics models. They are also reproducible to the extent that 
> > one can identify specific atoms just based on their anisotropic tensor from 
> > independent structure determinations in the same crystal form. They are of 
> > course not immune to errors and variation.
> > 
> > I also wonder how we can represent model parameter variation in the best 
> > way. I admire NMR spectroscopists’ approach to deposit multiple samples 
> > from a structural distribution. One could reproduce their conclusions 
> > without assuming any sort of error model from these samples. In 
> > crystallography, we have more and more distributions to deal with because 
> > we are swimming in data. It is easy to sample/resample data sets from the 
> > same or different crystals (SFX for example). Which can lead to many 
> > replicates of structural models. I cannot really motivate to create 
> > multiple PDB entries for these replicates, it is not good for to reader to 
> > try to understand which PDB codes belong to which group of samples. Maybe 
> > it works for up to 10 structures, but how about a 100? Is it possible to 
> > deposit crystal structures as a chain of model/data pairs under the same 
> > entry? It is possible to just make a tarball and deposit in alternative 
> > services such as Zenodo, but it would be a pity to completely bypass the 
> > PDB. I can think of more compact description of structural distributions, 
> > for example mean positions and mean B-factors of atoms with their 
> > associated covariance matrices, analogously how MD trajectories can be 
> > described as average structures and covariance matrices.  I think the 
> > assumption of independent variations per atoms is too strong in many cases 
> > and does not give an accurate picture of uncertainty.
> > 
> > Best wishes,
> > 
> > Gergely
> > 
> > Gergely Katona, Professor, Chairman of the Chemistry Program Council 
> > Department of Chemistry and Molecular Biology, University of 
> > Gothenburg Box 462, 40530 Göteborg, Sweden
> > Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
> > Web: http://katonalab.eu, Email: gergely.kat...@gu.se


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to