Re: [ccp4bb] How high a B factor is too high to assume a loop is in place, in the AlphaFold era?

John R Helliwell Fri, 02 Aug 2024 08:48:09 -0700

Dear Colleagues,

I think this paper from 1979 is still very interesting:-

Crystallographic studies of the dynamic properties of lysozyme

nature.com

Have a great weekend,

John

Emeritus Professor John R Helliwell DSc

On 2 Aug 2024, at 16:29, Bohdan Schneider <[email protected]> wrote:

Hello:

yes, a great discussion! I second Eleanor's statement that B-factors of high resolution structures do carry a message about atom flexibility. I attach a screenshot of a figure from our paper (Schneider et al.: Local dynamics of proteins and DNA evaluated from crystallographic B factors, Acta Cryst. (2014). D70, 2413–2419) that shows clear resolution dependence of B factors at protein/protein interface for amino acids and waters. Our high resolution group of structures could not be below 1 Å as Eleanor suggests but even modest limit to 1.9 Å and then structures at 1.9-2.5 and 2.5-3.0 show the effect clearly. We looked at several other groups of atoms (backbone/side chains at the protein core, at the protein surface, DNA phosphates/bases, waters at the interfaces or bound on the protein surface) and saw the same dependence.

Best,

Bohdan, bs.structbio.org

On 2024-08-02 13:26, Eleanor Dodson wrote:
All interesting points.. (And good to see a reference to /" P.A. Machin, J.W. Campbell, M. Elder (Eds)
Refinement of Protein Structures, SERC Daresbury Laboratory, Warrington, UK (1980)"/
- for those who remember, a super exciting discussion over what was feasible for refinement, and how to do it! )
My take - if a crystal diffracts to 1A we can be fairly sure of the accurate position of most of the coordinates, see other conformations for some regions, and give realistic B values to most atoms.
If the crystal only diffracts to 3A then the lattice is not perfect, and there must be multiple conformations for lots of the molecule.
There is not going to be sufficient experimental data to model this properly so every parameter assuming a single conformer - coordinate, B value, occupancy - is an approximation. Restraints help to some extent but they impose prior knowledge and do not glean information from the experimental data.
The "trash can" should indicate the degree of uncertainty and interpreting that is a bit problematic. B values twice the overall B ?? Hmm- do NOT base too much faith in that part of the model.. As crystallographers I think maybe we need to flag this better for trusting users of the information. Omitting that region? I am not sure .. How do others model those floppy lysines? I usually make a sort of informed guess but indeed giving a single conformation is not the truth, the whole truth, and nothing but the truth..
On Fri, 2 Aug 2024 at 01:14, James Holton <[email protected]> wrote:
   __
   I submit that modern B factor restraints make them much less trashy
   than they were in the early days. As Pavel points out the exact
   strategies differ from program to program, but I don't think anybody
   does unrestrained B factor refinement. Not by default.
   Besides, all we are really doing is fitting Gaussian-shaped peaks to
   the "curve" of the data. These peaks have a width and a height. For
   example, a carbon atom with B=20 has a peak density of 1.6 e-/A^3
   and a full-width-at-half-max (FWHM) of 1.4 A. That is it! That is
   the model density being fit. If you increase to B=80 the peak drops
   to 0.3 e-/A^3 and the FWHM increases to 2.6 A. At the largest B you
   can stuff into a PDB file (999.99), the peak height is 0.008 e-/A^3
   and the "peak" is 8.45A wide. Your disordered loop, however, is
   probably not sampling from a symmetric Gaussian distribution like
   that. This is the real problem with large B factors. They can fit
   better than sharper B atoms, but that doesn't mean they fit well.
   Occupancy is easy because all it does is scale the height without
   affecting the width. So, an 0.5 occupancy atom model is half the
   height of a full-occupancy one. The width is unchanged. B factors
   impact both width and height because they must preserve the number
   of electrons in the peak. This is perhaps why they are often
   confusing and mysterious. We should also never forget that bulk
   solvent gets excluded with exactly the same radii rules from every
   modeled atom, regardless of B factor and occupancy. So, the "change
   in density" from adding or deleting an atom is a little more
   complicated than adding or subtracting a Gaussian peak.
   Nevertheless, if you want to fit peak height and width independently
   (like we do in pretty much every other kind of curve fitting), then
   you should refine occupancy and B factors at the same time.
   Over-fitting you say? Hardly. Polynomials are easy to over-fit, but
   not Gaussians. Observations/parameters is a useful guide for
   polynomial fits, but in general the hallmark of over-fitting is that
   the prediction passes exactly through all the observed points (and
   not the cross-validation or "Rfree" points). I have never seen a
   macromolecular refinement end up with Rwork = 0. Have you?
   At the end of the day, what we do with our models is look at their
   parameters and try to extract the physically meaningful reality they
   are trying to capture. Restraints are very helpful in preventing
   many types of unrealistic situations, but ultimately it is up to you
   to decide if the fitted model makes sense.
   -James Holton
   MAD Scientist
   On 7/30/2024 11:30 AM, Ian Tickle wrote:

   Obviously no refined parameters can ever be completely error-free,
   it's just that for the co-ordinates we have very accurate
   geometric restraints so that the relative uncertainty in the
   refined co-ordinates is small (but try refining co-ordinates
   without restraints!). For the B factors we don't have accurate
   estimates (if any) for their restraints so their relative
   uncertainty after refinement is much greater.

   -- Ian

   On Tue, Jul 30, 2024 at 6:57 PM Oganesyan, Vaheh <[email protected]> wrote:

       Yes, it is and I like the definition of shared “trash bin”. It
       will have more physical meaning if we can separate those
       contributions into separate bins.

       Vaheh

       *From:* Pavel Afonine <[email protected]
       <mailto:[email protected]>>
       *Sent:* Tuesday, July 30, 2024 1:51 PM
       *To:* Oganesyan, Vaheh <[email protected]
       <mailto:[email protected]>>
       *Cc:* [email protected] <mailto:[email protected]>
       *Subject:* Re: [ccp4bb] How high a B factor is too high to
       assume a loop is in place, in the AlphaFold era?

       Vaheh,

       I think coordinates are no different from B factors,
       occupancies, f', or f'' in this respect. Coordinates can play
       their "trash bin" role by adjusting to the noise at the
       expense of violated geometry (bonds, angles, planes, torsions,
       etc.). As I mentioned in my previous email, their trash bin
       capacity is much smaller (but definitely not zero!) because
       the number and strength (confidence) of geometry restraints
       are much greater than those of ADP restraints.

       I agree that all refined parameters share this trash bin
       capacity, but to varying degrees. Isn't this essentially what
       we call the error on the refined parameter? All refined
       parameters have their error bars, which we have referred to as
       the "trash bin" in this thread.

       Pavel

       On Tue, Jul 30, 2024 at 10:09 AM Oganesyan, Vaheh
       <[email protected]> wrote:

           Your point is taken, Pavel. However, despite resolution,
           you define coordinate of the atom as a geometric point
           with no width. Although coordinates are “refineable”, they
           have no capacity for “trash”. Their “trash” still goes
           into B-factor “trash bin”. At least this is how I see it.

           Thank you.

           *Vaheh Oganesyan, Ph.D.*
           *R&D **| Biologics Engineering*
           One Medimmune Way, Gaithersburg, MD 20878
           T: 301-398-5851
           [email protected]

           *From:* Pavel Afonine <[email protected]>>
           *Sent:* Tuesday, July 30, 2024 11:45 AM
           *To:* Oganesyan, Vaheh <[email protected]>
           *Cc:* [email protected] <mailto:[email protected]>
           *Subject:* Re: [ccp4bb] How high a B factor is too high to
           assume a loop is in place, in the AlphaFold era?

           From this perspective, all refinable atomic model
           parameters can be viewed as trash bins, with the size of
           these bins being proportional to the amount of prior
           information (restraints) imposed on these parameters. For
           example, coordinates have the most restraints and thus are
           the smallest trash bins, while B factors have the least
           restraints and thus are one of the largest bins.

           Pavel

           On Tue, Jul 30, 2024 at 8:25 AM Oganesyan, Vaheh
           <[email protected]> wrote:

               Early in my Crystallography life I was postdoc with
               Robert Huber in Munich. We had those gatherings once a
               week when in very informal way we can ask and answer
               questions. I remember my question about B factors: how
               is it possible to have high resolution structure and
               average B-factor of 100A^2 . I think it was Robert or
               Albrecht Messerschmidt who told that B-factor is a
               “trash can” that describes not only loosely positioned
               atoms but also all other problems that either you
               created during processing, harvesting or crystal had
               from the beginning.

               *Vaheh Oganesyan, Ph.D.*
               *R&D **| Biologics Engineering*
               One Medimmune Way, Gaithersburg, MD 20878
               T: 301-398-5851
               [email protected]

               *From:* CCP4 bulletin board <[email protected]> *On Behalf Of *James
               Holton
               *Sent:* Tuesday, July 30, 2024 10:35 AM
               *To:* [email protected] <mailto:[email protected]>
               *Subject:* Re: [ccp4bb] How high a B factor is too
               high to assume a loop is in place, in the AlphaFold era?

               How high B factors can go depends on the refinement
               program you are using.

               In fact, my impression is that the division between
               the "let the B factors blow up" and "delete the
               unseen" camps is correlated to their preferred
               refinement program. You see, phenix.refine is
               relatively aggressive with B factor refinement, and
               will allow "missing" atoms to attain very high B
               factors. Refmac, on the other hand, has restraints
               that try to make B factor distributions look like
               those found in the PDB, and so tends to keep nearby B
               factors similar. As a result, you may get "red
               density" for disordered regions from refmac, inviting
               you to delete the offending atoms, but not from
               phenix, which will raise the B factor until the
               density fits.

               Then there are programs like VagaBond that don't
               formally have B factors, but rather let an ensemble of
               chains spread out in the loopy regions you are
               concerned about. This might be the way to go?

               You can also do ensemble refinement in the latest
               Amber. That is, you run an MD simulation of a unit
               cell (or more) and gradually increase structure factor
               restraints. This would probably result in the "fan" of
               loops you have in mind?

               -James Holton
               MAD Scientist

               On 7/28/2024 8:13 AM, Javier Gonzalez wrote:
                   Dear CCP4bb,

                   I'm refining the ~3A crystal structure of a big
                   protein, largely composed of alpha helices
                   connected by poorly-resolved loops.

                   In the old pre-AlphaFold (AF) days I used to
                   simply remove those loops/regions with too high B
                   factors, because there was little to none density
                   at 1 sigma in a 2Fo-Fc map.

                   However, considering that the quality of a
                   readily-computable AF model is comparable to a 3A
                   experimental structure, and that the UniProt
                   database is flooded with noodle-like AF models, I
                   was considering depositing a combined model in the
                   PDB.

                   Once R/Rfree reach a minimum for the model
                   truncated in poorly resolved loops, I would
                   calculate an augmented model with AF calculated
                   missing regions (provided they have an acceptable
                   pLDDT value), assign them zero occupancy, and run
                   only one cycle of refinement to calculate the
                   formal refinement statistics.

                   Would that be acceptable? Has anyone tried a
                   similar approach?

                   I'd rather do that instead of depositing a
                   counterintuitive model with truncated regions that
                   few people would find useful!!

                   Thank you for your comments,

                   Javier
                   --                     Dr. Javier M. González
                   Instituto de Bionanotecnología del NOA
                   (INBIONATEC-CONICET)
                   Universidad Nacional de Santiago del Estero (UNSE)
                   RN9, Km 1125. Villa El Zanjón. (G4206XCP)
                   Santiago del Estero. Argentina

                   Tel: +54-(0385)-4238352

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
<Figure_2_Acta Cryst. (2014).D70,2413.png>

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

Re: [ccp4bb] How high a B factor is too high to assume a loop is in place, in the AlphaFold era?

Reply via email to