Re: [ccp4bb] mmCIF as working format?

Ethan Merritt Wed, 07 Aug 2013 17:28:22 -0700

On Wednesday, August 07, 2013 04:54:39 pm Jeffrey, Philip D. wrote:
>  Nat Echols wrote:
> > Personally, if I need to change a chain ID, I can use Coot or pdbset or 
> > many other tools.  Writing code for
> > this should only be necessary if you're processing large numbers of models, 
> > or have a spectacularly
> > misformatted PDB file.
> 
> Problem.  Coot is bad at the chain label aspect.
> Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping 
> numbering.
> Try to change the chain label of X to A.
> I get "WARNING:: CONFLICT: chain id already exists in this molecule"


That would be a bug.  But it hasn't been true for any version of coot
that I have used.  As you say, this is a common thing to do and I am
certain I would have noticed if it didn't work. I just checked that
it isn't true for 0.7.1-pre.

What _is_ true is that renaming X to A in this case will not re-order
the residues in the file.  So if you had A1-100 followed by B1-10
followed by X101-200 there would not be a peptide  link between A100 and
A(old X)101 after the renaming.
To fix this you need to write out the file and use an editor to move the
records for A101-200 to immediately after the records for A1-100.

This does illustrate the point that expecting all tools to handle all
possible manipulations is unrealistic.  I think there will always be a
need for a separate tool that can do anything imaginable, whether that
tool is vi or emacs or some spiffy new mmCIF editing GUI.

The problem with this is that any tool capable or arbitrarily editing
your file is also capable of subtly mangling your file.  The current PDB
format is horribly sensitive to this.  For example if you
reorder/renumber/relabel ATOM records in a PDB file then references to them
in the header records (TLS, SITE, etc) and LINK/CONECT records will now point
to the wrong atoms.   I am not convinced that the new mmCIF format has gotten
this quite right either, at least in the examples given, but it does have the
flexibility to attach such links or properties directly to the ATOM record
where it is more likely to be carried along correctly if moved. 
That by itself is IMHO enough to justify the switch from PDB to mmCIF.

        Ethan


> 
> This is (IMHO) a bizarre feature because this is exactly the sort of thing 
> you do when building structures.
> 
> Therefore I do one of two things:
> 1.  Open it in (x)emacs, replace " X " with " A " and Bob's your uncle.
> 2.  Start Peek2 - that's my interactive program for doing simple and stupid 
> things like this.  I type "read test.pdb" and "chain" and Peek2 prompts me at 
> perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM 
> transitions &c) and then "write test.pdb".   Takes less than 10 seconds.  
> CCP4i would probably still be launching, as would Phenix.
> 
> The reason I do #1 or #2 is not to be a Luddite, but to do something trivial 
> and boring quickly so I can get back to something interesting like building 
> structures, or beating subjects to death on CCP4bb.
> 
> What's lacking is an interactive, or just plain fast method in any guise, way 
> of doing simple PDB manipulations that we do tons of times when building 
> protein structures.  I've used Peek2 thousands of times for this purpose, 
> which is the only reason it still exists because it's a fairly stupid 
> program.  A truly interactive version of PDBSET would be splendid.  But, 
> again, it always runs in batch mode.
> 
> mmCIF looked promising, apropos emacs, when I looked at the spec page at:
> http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html
> because that ATOM data is column-formatted.  Cool.  However looking at 
> 6LYZ.cif from RCSB's site revealed that the XYZ's were LEFT-justified: 
> http://www.rcsb.org/pdb/files/6LYZ.cif
> which makes me recoil in horror and resolve to use PDB format until someone 
> puts a gun to my head.
> 
> Really, guys, if you can put multiple successive spaces to the RIGHT of the 
> number, why didn't you put them to the LEFT of it instead ?  Same parsing, 
> better readability.
> 
> Phil Jeffrey
> Princeton
> (using the vernacular but deathly serious about protein structure)
> 
> 
> 
> 
> 
> 
> 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] mmCIF as working format?

Reply via email to