Hi Ian, The 'standard' you describe below is more of a suggestion than a rule. The PDB does not enforce a numbering scheme which is particularly annoying when dealing with engineered proteins with linkers or domains of different proteins (they come with all sorts of numbering schemes). Of course, when you use the ATOM records and distance criteria you should be able to work out what is connected and where the gaps are. Unfortunately, this is not always properly implemented in software (I had a nice recent case with a gap in an insertion in a nucleic acid, that cause problems working out the connectivity). When dealing with ranges of residues, e.g. in TSL group descriptions, numbering issues with (or without) insertion codes can be a real pain because ranges can be somewhat ambiguous. In theory, it is easy and insertion codes (or other numbering issues) should not be a problem at all. In practice, as Ed pointed out, it is a big mess.
Cheers, Robbie > -----Original Message----- > From: Ian Tickle [mailto:[email protected]] > Sent: Wednesday, December 05, 2012 17:26 > To: Robbie Joosten > Cc: [email protected] > Subject: Re: [ccp4bb] thanks god for pdbset > > I had always assumed that ASCII sort order was the standard so ' 128A' comes > after ' 128 ' in the collating sequence, and indeed the PDB documentation > seems to make it clear that it comes after, e.g. in the section describing the > ATOM record: > > > REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN > NUMBERING > ---------------------------------------------------------------------------- --------------- > -------------------------- > 59 59 > 60 60 > 61 > 62 62 > > REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN > NUMBERING > ---------------------------------------------------------------------------- --------------- > --------------------------- > 85 85 > 86 86 > 86A > 86B > 87 87 > > > But does it actually matter if the insertion comes before? Surely the > sequence is completely defined by the file order, regardless of the residue > numbering, not by the alphanumeric sorting order? So if 86A comes > immediately before 86 in the file then you must assume that 86A C is linked > to 86 N (assuming of course that the bond length is sensible), if after then it's > 86 C to 86A N. > > Cheers > > -- Ian > > > > On 5 December 2012 16:02, Robbie Joosten <[email protected]> > wrote: > > > Hi Ian, > > It's easy to forget about LINK records and such when dealing with the > coordinates (I recently had to fix a bug in my own code for that). > The problem with insertion codes is that they are very poorly defined > in the > PDB standard. Does 128A come before or after 128? There is no strict > rule > for that, instead they are used in order of appearance. This makes it > hard > for programmers to stick to agreed standards. Instead people rather > ignore > insertion codes altogether. They are really poorly soppurted by many > programs. Perhaps switching to mmCIF gets rid of the problem. > > Cheers, > Robbie > > > > -----Original Message----- > > From: CCP4 bulletin board [mailto:[email protected]] On > Behalf Of > > Ian Tickle > > Sent: Wednesday, December 05, 2012 16:39 > > To: [email protected] > > Subject: Re: [ccp4bb] thanks god for pdbset > > > > The last time I tried the pdbset renumber command because of > issues with > > insertion codes in certain programs, it failed to also renumber the > LINK, > > SSBOND & CISPEP records. Needless to say, thanking god (or even > God) was > > not my first thought! (more along the lines of "why can't software > > developers stick to the agreed standards?"). > > > > I haven't tried it with the latest version, maybe it's fixed now. > > > > -- Ian > > > > > > > > On 5 December 2012 07:58, Francois Berenger > <[email protected]> wrote: > > > > > > Especially the renumber command that changes > > residue insertion codes into an increment of > > the impacted residue numbers. > > > > Regards, > > F. > > > > > >
