I think quite a bit of this "inconsistency" with protein structures comes from the fact that with our larger globules it is much more true that our model is an approximate time and space average of something that could have the ideal geometry. I.e. the way we are trying to represent the density is actually not that appropriate. The only "improvement" to this I think is the multiple model approach.
My 2 c. Jan On Sat, Jan 15, 2022 at 9:29 PM James Holton <[email protected]> wrote: > > On 1/13/2022 11:14 AM, Tristan Croll wrote: > > (please don’t actually do this) > > > Too late! I've been doing that for years. What happens, of course, is > the "geometry" improves, but the R factors go through the roof. This I > expect comes as no surprise to anyone who has played with the "weight" > parameters in refinement, but maybe it should? What is it about our > knowledge of chemical bond lengths, angles, and radii that is inconsistent > with the electron density of macromolecules, but not small molecules? Why > do macro-models have a burning desire to leap away from the configuration > we know they adopt in reality? If you zoom in on those "bad clashes" > individually, they don't look like something that is supposed to happen. > There is a LOT of energy stored up in those little springs. I have a hard > time thinking that's for real. The molecule is no doubt doing something > else and we're just not capturing it properly. There is information to be > had here, a lot of information. > > This is why I too am looking for an all-encompassing "geometry score". > Right now I'm multiplying other scores together: > > score = (1+Clashscore)*sin(worst_omega)*1./(1+worst_rama)*1/(1+worst_rota) > > *Cbetadev*worst_nonbond*worst_bond*worst_angle*worst_dihedral*worst_chir*worst_plane > > where things like worst_rama is the "%score" given to the worst > Ramachandran angle by phenix.ramalyze, and worst_bond is the largest > "residual" reported among all the bonds in the structure by molprobity or > phenix.geometry_minimization. For "worst_nonbond" I'm plugging the > observed and ideal distances into a Leonard-Jones6-12 potential to convert > it into an "energy" that is always positive. > > With x-ray data in hand, I've been multiplying this whole thing by Rwork > and trying to find clever ways to minimize the product. Rfree is then, as > always, the cross-check. > > Or does someone have a better idea? > > -James Holton > MAD Scientist > > > On 1/13/2022 11:14 AM, Tristan Croll wrote: > > Hard but not impossible - even when you *are* fitting to low-res density. > See https://twitter.com/crolltristan/status/1381258326223290373?s=21 for > example - no Ramachandran outliers, 1.3% sidechain outliers, clashscore of > 2... yet multiple regions out of register by anywhere up to 15 residues! I > never publicly named the structure (although I did share my rebuilt model > with the authors), but the videos and images in that thread should be > enough to illustrate the scale of the problem. > > And that was *with* a map to fit! Take away the map, and run some MD > energy minimisation (perhaps with added Ramachandran and rotamer > restraints), and I think it would be easy to get your model to fool most > “simple” validation metrics (please don’t actually do this). The upshot is > that I still think validation of predicted models in the absence of at > least moderate-resolution experimental data is still a major challenge > requiring very careful thought. > > — Tristan > > On 13 Jan 2022, at 18:41, James Holton <[email protected]> wrote: > > Agree with Pavel. > > Something I think worth adding is a reminder that the MolProbity score > only looks at bad clashes, ramachandran and rotamer outliers. > > > MPscore=0.426∗ln(1+clashscore)+0.33∗ln(1+max(0,rota_out−1))+0.25∗ln(1+max(0,rama_iffy−2))+0.5 > > It pays no attention whatsoever to twisted peptide bonds, C-beta > deviations, and, for that matter, bond lengths and bond angles. If you > tweak your weights right you can get excellent MP scores, but horrible > "geometry" in the traditional bonds-and-angles sense. The logic behind this > kind of validation is that normally nonbonds and torsions are much softer > than bond and angle restraints and therefore fertile ground for detecting > problems. Thus far, I am not aware of any "Grand Unified Score" that > combines all geometric considerations, but perhaps it is time for one? > > Tristan's trivial solution aside, it is actually very hard to make all the > "geometry" ideal for a real-world fold, and especially difficult to do > without also screwing up the agreement with density (R factor). I would > argue that if you don't have an R factor then you should get one, but I am > interested in opinions about alternatives. > > I.E. What if we could train an AI to predict Rfree by looking at the > coordinates? > > -James Holton > MAD Scientist > > On 12/21/2021 9:25 AM, Pavel Afonine wrote: > > Hi Reza, > > If you think about it this way... Validation is making sure that the model > makes sense, data make sense and model-to-data fit make sense, then the > answer to your question is obvious: in your case you do not have > experimental data (at least in a way we used to think of it) and so then of > these three validation items you only have one, which, for example, means > you don’t have to report things like R-factors or completeness in > high-resolution shell. > > Really, the geometry of an alpha helix does not depend on how you > determined it: using X-rays or cryo-EM or something else! So, most (if not > all) model validation tools still apply. > > Pavel > > On Mon, Dec 20, 2021 at 8:10 AM Reza Khayat <[email protected]> wrote: > >> Hi, >> >> >> Can anyone suggest how to validate a predicted structure? Something >> similar to wwPDB validation without the need for refinement statistics. I >> realize this is a strange question given that the geometry of the model is >> anticipated to be fine if the structure was predicted by a server that >> minimizes the geometry to improve its statistics. Nonetheless, the journal >> has asked me for such a report. Thanks. >> >> >> Best wishes, >> >> Reza >> >> >> Reza Khayat, PhD >> Associate Professor >> City College of New York >> Department of Chemistry and Biochemistry >> New York, NY 10031 >> >> ------------------------------ >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > -- Jan Dohnalek, Ph.D Institute of Biotechnology Academy of Sciences of the Czech Republic Biocev Prumyslova 595 252 50 Vestec near Prague Czech Republic Tel. +420 325 873 758 ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
