Hi Robbie, On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote: > Hi Ian, > > It's easy to forget about LINK records and such when dealing with the > coordinates (I recently had to fix a bug in my own code for that). > The problem with insertion codes is that they are very poorly defined in the > PDB standard. Does 128A come before or after 128? There is no strict rule > for that, instead they are used in order of appearance. This makes it hard > for programmers to stick to agreed standards. Instead people rather ignore > insertion codes altogether. They are really poorly soppurted by many > programs. Perhaps switching to mmCIF gets rid of the problem.
Properly used, the PDB exchange dictionary for mmCIF can indeed sort this out. In addition to the PDB-style residue number + insertion code, it has an item for the residue sequence number in the chain (running from 1 .. n). The relevant item names are: _atom_site.pdbx_PDB_residue_no _atom_site.pdbx_PDB_ins_code and: _entity_poly_seq.num One thing to be careful of, is cases where the insertion code is a digit (which does happen sometimes). I have seen code many times where an assumption is made that the insertion code is not a digit, and this is assumption is used to separate the residue number from the insertion code (e.g. a user is asked to enter a residue number + insertion code as a single item). If the insertion code is a digit, this won't work. This is easy to handle in the fixed-width PDB format: 85 851 852 86 but if it gets written to mmCIF incorrectly as: loop_ _atom_site.pdbx_PDB_residue_no _atom_site.pdbx_PDB_ins_code 85 . 851 . 852 . 86 . instead of the correct: loop_ _atom_site.pdbx_PDB_residue_no _atom_site.pdbx_PDB_ins_code 85 . 85 1 85 2 86 . it can be really hard to sort out later on. Regards, Peter. -- Peter Keller Tel.: +44 (0)1223 353033 Global Phasing Ltd., Fax.: +44 (0)1223 366889 Sheraton House, Castle Park, Cambridge CB3 0AX United Kingdom