All:

Although I also like Dale's idea of paying attention to HETNAM records, I think 
we should stay in concordance with PDB rules and use HETNAM records only for 
IUPAC names.  Fortunately, the PDB also has HETSYN records and that is where we 
place the BMS name (number) of a compound when we store the coordinates for 
internal use.  The PDB also uses HETSYN records for this type of information. 
Currently, we add this information only at time we store coordinates in our 
internal database, but no reason exists that we couldn't do so earlier. I am 
reasonably certain GlobalPhasing, Ltd.'s BUSTER refinement program would carry 
this along and retain it in the output PDB file (input to COOT), but I can't 
speak to what REFMAC or PHENIX.REFINE would do.  Here is an example, based, 
with a little poetic license, on PDB entry 1Z6E:

HETNAM     IK8 1-(3-AMINO-1,2-BENZISOXAZOL-5-YL)-N-(4-{2-
HETNAM   2 IK8  [(DIMETHYLAMINO)METHYL]-1H-IMIDAZOL-1-YL}-2-
HETNAM   3 IK8  FLUOROPHENYL)-3-(TRIFLUOROMETHYL)-1H-PYRAZOLE-5-
HETNAM   4 IK8  CARBOXAMIDE
HETSYN     IK8 BMS-561389

[BMS does publish compound numbers for molecules that enter clinical trials, 
whence my ability to use it here. The poetic license consists of removing the 
other information in the HETSYN records, which are a different IUPAC name 
(IUPAC names are not unique), the generic name of the compound (RAZAXABAN), and 
another numeric designation for the compound (DPC906).]
[Of course, as explained in my previous email, for internal storage, we would 
use the identifier LG1 rather than the IK8 used in entry 1Z6E].

If it is very little work for Paul to implement the connection between a 
compound ID and a particular CIF restraint file, then that makes this an 
extremely attractive solution.

However, I don't understand without embedding the compound number in the CIF 
restraint file, how you are going to connect a particular CIF restraint file 
with a particular ligand, when one has multiple CIF restraint files that all 
use the same residue code.

I will note that GlobalPhasing Ltd's GRADE CIF restraint file generator, GRADE, 
does embed the name of the input file in its "header" records.  Since GRADE 
prefers MOL2 files, although it accepts SMILES strings, which would defeat 
this, and other formats.  In GRADE's earliest days, GPhL recommended SMILES 
strings and that is what is embedded in the "header" records of those CIF 
restraint files. Using the example above and my current scripts that would look 
like: bms561389.mol2, i.e. the line currently looks like:

# GEN: Generated by GRADE 1.1.1 from mol2 file bms561389.mol2 using mogul+qm

However, if you needed the dash and/or CAPITAL letters for easy comparison, I 
could change my scripts accordingly.

You asked about molecular modeling software.  I have very limited experience 
with various packages, knowing only enough to invert chiral centers when our 
corporate database serves up the "wrong" hand for racemic mixtures or 
homochiral molecules, where the chirality was guessed wrong.  At the moment 
since our computational chemists overwhelmingly use MAESTRO and thus that is 
what I have used to do this for the past 5 years or so.  MAESTRO has an 
extremely hard time with figuring out organic (non-proteinaceous) molecules in 
PDB format, so I feed it only mol (or sdf) and/or mol2 formats.  These formats 
don't know anything about residue codes, but, of course, specify bonding order 
and chirality quite well. So MASESTRO must use internal identifiers for each 
molecule (and, in fact, the project table in MAESTRO, uses numbers to identify 
various version of a molecules). I will ask one of our computational chemists 
to address the issue more fully and get back to you with more complete answer 
(probably off-line).

Steven



>-----Original Message-----
>From: Mailing list for users of COOT Crystallographic Software
>[mailto:[email protected]] On Behalf Of Paul Emsley
>Sent: Tuesday, January 24, 2012 9:54 PM
>To: [email protected]
>Subject: Re: New restraints, same name
>
>Thanks to all contributors, I have been informed, educated and
>entertained.
>
>A bit of background perhaps... (it seems that I have been living in the
>0.7 world long enough to forget that not everyone else is here). "[T]he
>viewer programs don't care about the restraint dictionaries"  says Seth
>Harris - haha - in olden Coots that was the case... :)  It is my hope
>that Coot will be used for comparison, evaluation, validation and
>manipulation of ligands in protein-ligand complexes and their electron
>density.
>
>My current obsession is with chemical structure diagrams - here's a
>recent screenshot:
>http://lmb.bioch.ox.ac.uk/coot/screenshots/Screenshot-example-2010-01-
>02.png
>
>... and here's one I made earlier today, illustrating the sorts of
>problems I am trying to handle (PI3 Kinase ligand, 4a55):
>http://lmb.bioch.ox.ac.uk/coot/screenshots/Screenshot-Coot-prodrg-
>valence-problem.png
>amusing, eh?
>
>Anyway, to make the chemical diagram and the 3D bonding representation I
>need to construct a description of the ligand that contains bond
>orders.  Hence restraints.  So yes, let me emphasize that this is mostly
>for drawing pictures and I don't see the use case of refinement of
>multiple different ligand complexes as very useful.
>
>I do like Dale's idea - the use of HETNAM and synonyms - so, as I
>understand it, the PDB file has a residue called LIG and the dictionary
>has a comp-id of
>"2-(N-methylmethanesulfonamido)-6-(propan-2-yl)pyrimidine" (or
>XYZ0123456 or whatever) and a HETNAM record in the PDB file provides the
>mapping.  Is this a solution?   It is attractive because it requires
>very little work from me.
>
>I did consider Judit's idea, i.e. check the atom names in the
>coordinates against the dictionary before bonding.  I thought that there
>may be (too many?) pathological cases where the names did match (at
>least for ligand fragments) but the chemistry did not.  Let me know if
>you think that I need not worry so much about that.  This is relatively
>easy to do.  However, this only solves the "tangle" problem - and I
>think that that for practical purposes that may be covered now as I have
>recently turned off restraints auto-loading for several "special"
>three-letter codes - one can (at least) see "noddy" bonding instead of a
>tangle.
>
>To answer Garib's point: yes, in Coot there is indeed a single
>table/dictionary of restraints, with the key/index being the
>comp-id/residue-name.  It applies to all molecules.  I had not before
>considered the option of embedding monomer restraints inside a Coot
>molecule - that might be a cleaner solution. I will ponder on that.  It
>does mean that you will have to read restraints after reading
>coordinates though.
>
>And yes, I do occasionally wonder how computational chemistry software
>(Maestro, Vida for example?) solves this problem.  Presumably such
>software is used to show several overlaying ligand structures (all
>called "LIG"?).  And computational chemists like to see chemistry, and
>not just coloured sticks, right?
>
>Thanks,
>
>Paul.

This message (including any attachments) may contain confidential, proprietary, 
privileged and/or private information.  The information is intended to be for 
the use of the individual or entity designated above.  If you are not the 
intended recipient of this message, please notify the sender immediately, and 
delete the message and any attachments.  Any disclosure, reproduction, 
distribution or other use of this message or any attachments by an individual 
or entity other than the intended recipient is prohibited.

Reply via email to