They say one test is worth a thousand expert opinions, so I tried my
hand at the former.
The question is: what is the right way to treat disordered side chains?:
a) omit atoms you cannot see
b) build them, and set occupancy to zero
c) build them, and "let the B factors take care of it"
d) none of the above
The answer, of course, is d).
Oh, c'mon. Yes, I know one of a,b, or c is what you've been doing your
whole life. I do it too. But, let's face it: none of these solutions
are perfect. So, the real question is not which one is "right", but
which is the least wrong?
We all know what is really going on: the side chain is flapping around.
No doubt it spends most of its time in energetically reasonable but
nevertheless numerous conformations. There are 41 "Favorable" rotamers
for Lys alone, and it doesn't take that many to spread the density thin
enough to fall below the classical 1-sigma contour level. The atoms are
still there, they are still contributing to the data, and they haven't
gone far. So why don't we "just" model that? Already, I can hear the
cries of "over-fitting!" and "observations/parameters!", "model bias!",
and "think of the children!" Believe it or not, none of these are the
major issue here. Allow me to demonstrate:
Consider a simple case where we have a Lys side chain in ten conformers.
I chose from popular rotamers, but evenly spread. That is, all 10
conformers have an occupancy of 0.10, and there is a 3-3-4 split of chi1
values between minus, plus and trans. This will give the maximum
contrast of density between CB and CG. Let us further require that
there is no strain in this ground-truth. No stretched bonds, no tortured
angles, no clashes, etc. Real molecules don't occupy such high-energy
states unless they absolutely have to. Let us further assume that the
bulk solvent works the way phenix models it, which is a probe radius of
1.1 A for both ions and aliphatics and a shrink radius of 0.9. But,
instead of running one phenix.fmodel job, I ran ten: one for each
conformer (A thru J). To add some excitement, I moved the main chain
~0.2 A in a random direction for each conformer. I then took these ten
calculated electron density maps (bulk solvent and all) and added them
together to form the ground truth for the following trials. Before
refinement, I added noise consistent with an I/sigma of 50 and cut the
resolution at 2.0 A. Wilson B is 50:
CCtrue Rwork% Rfree% fo-fc(sigma) description
0.8943 9.05 10.60 5.9 stump at CB
0.9540 9.29 11.73 6.0 single conformer, zero occupancy
0.9471 10.35 15.04 5.1 single conformer, full
occupancy, refmac5
0.9523 9.78 15.61 4.9 single conformer, full
occupancy, phenix.refine
So, it would appear that the zero-occupancy choice "wins", but by the
narrowest of margins. Here CCtrue is the Pearson correlation
coefficient between the ground-truth right-answer electron density and
the 2fofc map resulting from the refinement. Rwork and Rfree are the
usual suspects, and fo-fc indicates the tallest peak in the difference
map. Refinement was with refmac unless otherwise indicated. I think we
often forget that both phenix and refmac restrain B factor values, not
just through bonds but through space, and they use rather different
algorithms. Refmac tries to make the histogram of B factors "look
right", whereas phenix allows steeper gradients. I also ran all 10
correct rotamers separately and picked the one with the best CCtrue to
show above. If you instead sort on Rfree (which you really shouldn't
do), you get different bests, but they are not much better (as low as
10.5%). So, the winner here depends on how you score. CCtrue is the
best score, but also unfortunately unavailable for real data.
It is perhaps interesting here that better CCtrue goes along with
worse Rfree. This is not what I usually see in experiments like this.
Rather, what I think is going on here is the system is frustrated. We
are trying to fit various square pegs into a round hole, and none of
them fit all that well.
In all cases here the largest difference peak was indicating another
place to put the Lys, so why not build into that screaming, 6-sigma
difference peak? Here is what happens when you do that:
CCtrue Rwork% Rfree% fo-fc(sigma) description
0.8943 9.05 10.60 5.9 stump at CB
0.9580 9.95 11.60 6.4 stump at CG
0.9585 10.20 12.29 6.2 stump at CG, all 10 confs
0.9543 10.61 12.24 5.3 stump at CD, all 10 confs
0.9383 10.69 14.64 4.1 stump at CE, all 10 confs
0.9476 9.66 13.48 4.6 all atoms, all 10 confs
0.9214 7.09 11.8 5.6 three conformers (worst of 120 combos)
0.9718 6.53 8.55 4.3 three conformers (best of 120 combos)
0.9710 7.17 9.44 6.1 two conformers (best of 45 combos)
0.9471 10.35 15.04 5.1 single conformer (best of 10 choices)
If I add one CG, the other two chi1 positions light up. So, I tried
building in all 10 true CG positions, and let the refinement decide what
to do with them. The clear indication was that a CD should be added.
After adding all the CDs, the difference peaks were weaker, but still
indicating more atoms were needed. Rwork and Rfree, however, tell the
opposite story. They get worse the more atoms you add. CCtrue, on the
other hand, was best when cutting everything after CG. Why is that?
Well, every time you add another atom you fill in the difference
density, but then that atom pushes back the bulk solvent model that was
filling in the density for the next atom. The atom-to-solvent distance
is roughly twice that of a covalent bond. So again, square pegs and
round holes.
Three conformers coming out as the winner may be because it is a
selective process with a noisy score. In the ground truth there are 10
conformers at equal occupancy, so no one triplet is really any better
than any other. However, one has a density shape that fits better than
other combos. My search over all possible quartets is still running.
But what if we got the solvent "right"? Well, here is what that looks like:
CCtrue Rwork% Rfree% fo-fc(sigma) description
0.9476 9.66 13.48 4.6 all atoms, all confs, refmac defaults
0.9696 6.15 8.88 3.7 all atoms, all confs, phenix.refine
0.9825 0.80 0.89 3.9 all atoms, all confs, true solvent
0.9824 0.92 1.26 7.3 true model, minus one H atom
from ordered HIS side chain
You can see that the default solvent of phenix.refine fares better than
refmac here, but since I generated the solvent with phenix refine it may
have an unfair advantage. Nevertheless, providing the "true solvent"
here is quite a striking drop in R factors. This is not surprising
since this was the last systematic error in this ground truth. In all
cases, I provided the true atomic positions at the start of refinement,
so there was no confusion about strain-inducing local minima, such as
which rotamer goes with which main chain shift. And yes, you can
provide arbitrary bulk solvent maps to refmac5 using the "Fpart"
feature. I've had good luck with real data using bulk density derived
form MD simulations.
What is more, once the R factors are this low I can remove just one
hydrogen atom and it comes back as a 7.3-sigma difference peak. This
corresponds to the protonation state of that His. This kind of
sensitivity is really attractive if you are looking for low-lying
features, such as partially-occupied ligands. Some may pooh-pooh R
factors as "cosmetic" features of structures, but they are, in fact,
nothing more or less than the % error between your model and your data.
This % error translates directly into the noise level of your map. At
20% error there is no hope whatsoever of seeing 1-electron changes. This
is because hydrogen is only 17% of a carbon. But 3-5% error, which is a
typical experimental error in crystallographic data, anything bigger
than one electron is clear.
-James Holton
MAD Scientist
On 3/18/2023 2:10 PM, Nicholas Pearce wrote:
Not stupid, but essentially the same as modelling alt confs, though
would probably give more overfitting. Alt confs can easily be
converted to an ensemble (if done properly…).
Thanks,
Nick
———
Nicholas Pearce
Assistant Professor in Bioinformatics & DDLS Fellow
Linköping University
Sweden
------------------------------------------------------------------------
*From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of
benjamin bax <ben.d.v....@gmail.com>
*Sent:* Saturday, March 18, 2023 10:07:26 PM
*To:* CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
*Subject:* Re: [ccp4bb] To Trim or Not to To Trim
Hi,
Probably a stupid question.
Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or
27 structures) and restrain well defined parts of structure to be
‘identical’ ? To give you a more NMR like chemically sensible ensemble
of structures?
Ben
> On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote:
>
> Models for crystallography have two purposes: refinement and
interpretation. Here these two purposes are in conflict. Neither case
is handled well by either trim or not trim scenario, but trimming
results in a deficit for refinement and not-trimming results in a
deficit for interpretation.
>
> Our computational tools are not “fixed” in the same way that the
standard amino acids are “fixed” or your government’s bureaucracy
pathways are “fixed”. They are open for debate and for adjustments.
This is a fine example where it may be more productive to discuss the
options for making changes to the model itself or its representation,
to better account for awkward situations such as these. Otherwise we
are left figuring out the best imperfect way to use an imperfect tool
(as all tools are, to varying degrees!), which isn’t satisfying for
enough people, enough of the time.
>
> I now appreciate the hypocrisy in the argument “do not trim, but
also don’t model disordered regions”, even though I’d be keen to avoid
trimming. This discussion has therefore softened my own viewpoint.
>
> My refinement models (as implemented in Vagabond) do away with the
concept of B factors precisely for the anguish it causes here, and
refines a distribution of protein conformations which is sampled to
generate an ensemble. By describing the conformations through the
torsion angles that comprise the protein, modelling flexibility of a
disordered lysine is comparatively trivial, and indeed modelling all
possible conformations of a disordered loop becomes feasible. Lysines
end up looking like a frayed end of a rope. Each conformation can
produce its own solvent mask, which can be summed together to produce
a blurring of density that matches what you would expect to see in the
crystal.
>
> In my experience this doesn’t drop the R factors as much as you’d
assume, because blurred out protein density does look very much like
solvent, but it vastly improves the interpretability of the model.
This also better models the boundary between the atoms you would trim
and those you’d leave untrimmed, by avoiding such a binary
distinction. No fear of trimming and pushing those errors unseen into
the rest of the structure. No fear of leaving atoms in with an
inadequate B factor model that cannot capture the nature of the disorder.
>
> Vagabond is undergoing a heavy rewrite though, and is not yet ready
for human consumption. Its first iteration worked on
single-dataset-single-model refinement, which handled disordered side
chains well enough, with no need to decide to exclude atoms. The heart
of the issue lies in main chain flexibility, and this must be handled
correctly, for reasons of interpretability and elucidating the
biological impact. This model isn’t perfect either, and necessitates
its own compromises - but will provide another tool in the structural
biology arsenal.
>
> —-
>
> Dr Helen Ginn
> Group leader, DESY
> Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR)
> Luruper Chaussee 149
> 22607 Hamburg
>
> ########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
>
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0>
>
> This message was issued to members of
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0>,
a mailing list hosted by
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0>,
terms & conditions are available at
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0>
This message was issued to members of
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0>,
a mailing list hosted by
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0>,
terms & conditions are available at
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0>
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/