Hi all, Just revisiting this old thread, since it would seem I owe Paul Emsley something of an apology... after discussion with various people prior to submission of my manuscript, I concluded that overly-aggressive manual model building in Coot was probably the most common source of erroneous cis bonds, and said as much in the text. Now I'm thinking they probably creep in much earlier. I've just been looking at the output of an AutoBuild run of a 3.1A structure (my first ever - I'm not a crystallographer by training, so I'm sort of working backwards from the improvement of existing structures to learning how to produce them), and I count 16 cis bonds in 640 residues. If I'm reading the code correctly, I think the problem arises in RESOLVE (specifically, in assemble_model.cpp) when individual fragments are joined - overlapping fragments are trimmed back and ligated, but there's never a geometric check for the stereochemistry of the newly formed peptide bond. So, any time the data becomes a bit ambiguous there's a chance that one of the two residues at the join will be in a sufficiently incorrect orientation that the peptide bond is made the wrong way around.
Best regards, Tristan ________________________________ From: Tristan Croll Sent: Tuesday, 17 February 2015 6:13 AM To: [email protected]; CCP4 bulletin board Subject: Re: [ccp4bb] Cis-peptide bond checking Dear Wouter, That does sound like a useful tool indeed - finding the proverbial needle in a haystack! That's the challenge with such a rare event: rather like a "true" Ramachandran outlier, when they do occur they're usually a sign of an important motif in your protein that should be remarked upon. To others making the same point: yes, I'm well aware of the existence of true cis peptides, and both re-calculate the background rate in high-res structures and briefly discuss their nature in my paper (my personal favourite example is tissue transglutaminase (2q3z) which contains two - one of which is induced by the formation of a vicinal disulfide bond. It's believed that reduction of the disulfide switches the backbone back to trans to activate the enzyme). But I'm currently unaware of any protein that contains more than 3-4 cis bonds that stand up under scrutiny, while there are many models out there with tens of, or up to a few hundred. For examples of erroneous assignment at high res look at 3ncq, 2gec or 2j82. It's not such a problem at high resolution, but at lower resolutions I'm more concerned about why the cis bonds have crept into the model. Are they simple innocuous oversights (as pointed out by Robbie Joosten, most - but certainly not all - appear in poorly-defined density), or have they come about due to accidentally force-fitting a loop that is fundamentally wrong (e.g. due to an adjacent strand being out of register)? In most cases it's of course the former, but what worries me is the example of a structure I found (since corrected by the authors) that had 86 cis bonds (1.4%), yet only 0.4% Ramachandran and RSRZ outliers. In a "good" structure one would expect an erroneous cis bond to introduce an outlier in some other metric - but it seems equally possible that in a "bad" structure it could bring an outlier back into a favoured region. Hope this clarifies my point. Cheers, Tristan ________________________________ From: [email protected] <[email protected]> Sent: Monday, 16 February 2015 9:55 PM To: Tristan Croll; [email protected] Subject: Re: [ccp4bb] Cis-peptide bond checking Dear Tristan, Thank you for your post of earlier today regarding the problem of cis and trans peptide planes in the PDB. We also realised this problem a while ago and an article describing this problem and a solution is presently under review at Acta Cryst. D. After analysis of the PDB we can state with >95% certainty that ~4600 trans -> cis flips in ~2800 entries (and ~70K peptide-plane flips) are needed in the PDB. Around a third of the trans -> cis corrections concern non-prolines. We hope to be able to deal with the problem of cis -> trans corrections later. In the tradition of our group, the software to detect these flips is already available at swift.cmbi.ru.nl. Hopefully, the referees of our article consider this topic just as important as you and I do :-). Kind regards, Wouter Touw and Gert Vriend On 02/16/2015 10:58 AM, Tristan Croll wrote: Dear all, My apologies for the spam-like nature of my post, but I would like to draw your attention to an important issue (outlined in an upcoming short communication to Acta D, which will appear at doi:10.1107/S1399004715000826 once it's online). At present, neither the structural quality checks in commonly-used crystallography packages nor those run on deposition of a structure to the PDB are flagging the presence of non-proline cis peptide bonds. This has led to the presence of many erroneous cis bonds creeping into the PDB - primarily in low-resolution structures as one would expect, but I have identified clearly erroneous examples in structures with resolutions as high as 1.3 Angstroms. From my analysis, I estimate that a few thousand structures have been affected to some extent, with the worst cases having as high as 3% of their peptide bonds in cis. Particularly if you have published anything >2.5 Angstroms in the past few years, may I gently suggest that you make a quick double-check of your deposited structures? This can be done quickly and simply in Coot (Extensions-Modelling-Residues with Cis peptide bonds). Best regards, Tristan Het Radboudumc staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629. The Radboud university medical center is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.
