[Jmol-developers] [ jmol-Bugs-975793 ] pdb vs cif nomenclature

SourceForge.net Sat, 19 Jun 2004 02:49:10 -0700

Bugs item #975793, was opened at 2004-06-19 11:48
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=379133&aid=975793&group_id=23629


Category: Algorithms
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Miguel (migueljmol)
Assigned to: Miguel (migueljmol)
Summary: pdb vs cif nomenclature

Initial Comment:
many emails were exchanged on this topic. two are below.

---------------------------- Original Message
---------------------------- Subject: mmCIF/PDB
From:    "Bob Hanson" <[EMAIL PROTECTED]>
Date:    Thu, June 17, 2004 18:03
To:      "Miguel" <[EMAIL PROTECTED]>
--------------------------------------------------------------------------

Miguel, I asked the help desk at Rutgers. Here was the
response. Really,  I think you will be OK with the
mmCIF files just using the _auth_ fields:

See answers below.

Zukang

Bob Hanson wrote:

> Can you tell me a little more about the pdb-extract file 
> _atom_change_global.h?
>
> Please tell me if I am correct, and if not, what I
have wrong:
>
> -basically this is for changing mmCIF names BACK to
PDB names. 


Changed author defined atom names back to PDB names.

>
>
> -this is a set of model transformations that is meant
to encompass all  PDB "ATOM" types. 


Only for standard amino acids and nucleic acids

>
>
> -this is an n:1 mapping, implying that there are
several mmCIF 
> conventions all mapping to the same PDB name. For
example, HB2 and 2HB  both map to the PDB name 1HB. 


Yes. Some refinement program use HB2, HB3 conventions.
It will change to  1HB, 2HB. Others use HB1, HB2
conventions. Then the program will change  to 1HB, 2HB.

>
>
> -asterisks on the left would really be single quotes
in an mmCIF file. 


Yes.

>
>
> -S_T seems odd. Shouldn't it be H71 to 1H5M, not H51
to 1H5M? 


It's only work if there are H51, H52 and H53. The
program will take care  of C7, H71, H72 and H73 situation.


>
>
> One additional question: When mmCIF files are
delivered from rcsb, are  the "_auth_name" records
always the PDB standard names, or are they  the
"author's" names, which might be anything?


Always the PDB standard names.


-----------------
Bob Hanson wrote:

Thank you very much. Almost got it! See below.


Zukang Feng wrote:

> See answers below.
>
> Zukang
>
> Bob Hanson wrote:
>
>> Can you tell me a little more about the pdb-extract
file 
>> _atom_change_global.h?
>>
>> Please tell me if I am correct, and if not, what I
have wrong:
>>
>> -basically this is for changing mmCIF names BACK to
PDB names. 
>
>
>
>
> Changed author defined atom names back to PDB names.
>

>
>
> One additional question: When mmCIF files are
delivered from rcsb, are  the "_auth_name" records
always the PDB standard names, or are they  the
"author's" names, which might be anything?




Always the PDB standard names.

Is the idea that authors submit presumed "mmCIF" files
(or otherwise) to  rcsb, but really they have to be
fixed up before depositing. So the idea  is, this
atom_change information allows you to first convert the 
nonstandard names to standard PDB names for placement
in the 
_atom_site.auth_atom_id field, then (later?) convert
them to mmCIF for  the _atom_site.label_atom_id field.
In this way the  
_atom_site.auth_atom_id always contains the "standard
PDB" name?

Bob Hanson




Yes. It always convert the non-standard names to
standard PDB names  first. Later we can automatically
convert to another standard names in 
_atom_site.label_atom_id. Currently we put IUPAC
standard names in that  field. It would be great if the
authors fix them up before depositing.  But I think our
programs can handle all situations of most popular 
refinement programs.

Regards,

Zukang Feng


>>>
>>>
>>> Thank you very much,
>>>
>>> Bob Hanson
>>>
>>>
>>
>>
>


hasta luego,

Bob

---------------------------- Mensaje Original
---------------------------- Asunto: Re:
[Jmol-developers] * in .pdb <-> ' in .cif
De:     "Bob Hanson" <[EMAIL PROTECTED]>
Fecha:  Thu, 17 de Junio de 2004, 14:25
Para:   [EMAIL PROTECTED]
--------------------------------------------------------------------------

OK, really it's simple, but you need to know a little
chemistry if you  want it to make any sense. What is
going on is that some amino acid  sidechains have
carbons with only one H, some with two, and some with 
three.

For the carbons with one H, there is no issue--they are
simply referred  to as, say, HB or HD.

For carbons with three Hs, there is also no issue--they
are simply  referred to as, say, HD1, HD2, HD3. (Old
method: 1HD, 2HD, 3HD.) Or, if  the carbon had a number
already, then the old 1HD1, 2HD1 becomes HD11, HD12.

Same goes for NH2 sidechains (asn, arg, gln). No
problem here--same  numbering.

So far, no problems, right? But when a carbon bears two
Hs, the 
nomenclature has changed. Whereas before, the numbering
was a HYDROGEN  count, HB1, HB2, now it is a
SUBSTITUENT count, HB2, HB3, with "1"  missing, because
that would be "reserved" for the rest of the sidechain,
 which gets its numbering a different way.

What they've implemented in pdb-extract is a relatively
simple look-up  table that implements both the
renumbering and the repositioning of  numbers at the
same time.

If you handle the repositioning yourself, the look-up
table is simpler:

CHANGE_H_NUM CH2_ALA = {}
CHANGE_H_NUM CH2_ARG = {"HB", "HG"}
CHANGE_H_NUM CH2_ASP = {"HB"}
CHANGE_H_NUM CH2_ASN = {"HB"}
CHANGE_H_NUM CH2_CYS = {"HB"}
CHANGE_H_NUM CH2_GLN = {"HB", "HG"}
CHANGE_H_NUM CH2_GLU = {"HB", "HG"}
CHANGE_H_NUM CH2_GLY = {"HA"}
CHANGE_H_NUM CH2_HIS = {"HB"}
CHANGE_H_NUM CH2_ILE = {"HG1"}
CHANGE_H_NUM CH2_LEU = {"HB"}
CHANGE_H_NUM CH2_LYS = {"HB", "HD", "HE", "HG"}
CHANGE_H_NUM CH2_MET = {"HB", "HG"}
CHANGE_H_NUM CH2_PHE = {"HB"}
CHANGE_H_NUM CH2_PRO = {"HB", "HD", "HG", "H"}
CHANGE_H_NUM CH2_SER = {"HB"}
CHANGE_H_NUM CH2_THR = {}
CHANGE_H_NUM CH2_TRP = {"HB"}
CHANGE_H_NUM CH2_TYR = {"HB"}
CHANGE_H_NUM CH2_VAL = {}

(Note that "HT" will also become just "H" in PRO. I
have to admit, I  don't quite understand what is going
on with proline. Do "HT1" "HT2"  refer to the hydrogens
on the N? If so, then this does make sense,  because
there is a carbon on this N, so the H numbering by
IUPAC should  start with 2.) Here's the pdb-extract code:

CHANGE_ATOM_NAME_23_ONLY IU_PRO = {
        8,
         { { "HT1",  "H2"   },
           { "HT2",  "H3"   },
           { "1HB",  "HB2"  },
           { "2HB",  "HB3"  },
           { "1HG",  "HG2"  },
           { "2HG",  "HG3"  },
           { "1HD",  "HD2"  },
           { "2HD",  "HD3"  }

--------------------

>  
>
>>A,C,T,G,I:
>>
>>      OnP becomes OPn
>>      2HO* becomes HO2'
>>      1X* becomes X'
>>      2X* becomes X''
>>      "5M" becomes "7"
>>      nXm becomes Xmn
>>      * becomes '
>>
>>in that order.
>>    
>>
>
>That makes sense.
>  
>

add "U" to that list. It's: A C T G U I

>  
>
>>Then, in filterlib-v8\include\_atom_change_global.h 
we have seven fundamental
>>models which are mapped to more intresting groups
such as DAR, DAS, DCY.
>>    
>>
>
>Based upon my relatively short experience working with
this data, if they are using group names then they are
doomed to failure.
>  
>
now, now. Every group must have a relatively arbitrary
assignment of  atom names. That's what updates are for.
For all the "standard" 
monomers, there is no real problem, and the list of
"multiple atom"  changes is not that long:

static CHANGE_POOLS multiple_atoms[MULTIPLE_ATOMS] = {
       "A",   &M_N4,
       "C",   &M_N4,
       "G",   &M_N4,
       "T",   &M_N4,
       "U",   &M_N4

(NO "I" HERE? maybe a mistake.)

       "GLY", &M_02,

       "ASN", &M_04,
       "ASP", &M_04,
       "CYS", &M_04,
       "DAS", &M_04,
       "DCY", &M_04,
       "DLE", &M_04,
       "DPN", &M_04,
       "DSN", &M_04,
       "DSP", &M_04,
       "DTR", &M_04,
       "DTY", &M_04,
       "HIS", &M_04,
       "LEU", &M_04,
       "PHE", &M_04,
       "SER", &M_04,
       "TRP", &M_04,
       "TYR", &M_04,

       "DIL", &M_06,
       "ILE", &M_06,

       "DGL", &M_08,
       "DGN", &M_08,
       "GLN", &M_08,
       "GLU", &M_08,
       "MET", &M_08,

       "ARG", &M_12,
       "DAR", &M_12,
       "DPR", &M_12,
       "PRO", &M_12,

       "DLY", &M_16,
       "LYS", &M_16,
};

OK, maybe it's a bit long. I'm presuming you have these
.h files, right?  If not, get the pdb-extract source
and take a look. The above wouldn't  make any sense
without them. Take, for example, model#4:

static CHANGE_ATOM_NAME M_04 = {
        4,
         { { "HB2", "1HB" },
           { "HB3", "2HB" },
           { "2HB", "1HB" },
           { "3HB", "2HB" }
         }
};

This looks to me to be showing how two different
presumed mmCIF formats  might both be returned to the
old PDB format. That could certainly be a  complication
reading CIF files. But again, it's just in the files
with H  atoms--usually NMR files.

>  
>
>>I see from mmCIF output from the RCSB that BOTH the
IUPAC and PDB names ("auth")
>>are there. In actuality, it would appear that
different authors have different
>>conventions in PDB names (and maybe even CIF names),
but from the above it also
>>appears that even CIF name formats have evolved or
are somewhat
>>nonstandard.
>>
>>_atom_site.group_PDB
>>_atom_site.id
>>_atom_site.type_symbol
>>_atom_site.label_atom_id
>>_atom_site.label_alt_id
>>_atom_site.label_comp_id
>>_atom_site.label_asym_id
>>_atom_site.label_entity_id
>>_atom_site.label_seq_id
>>_atom_site.pdbx_PDB_ins_code
>>_atom_site.Cartn_x
>>_atom_site.Cartn_y
>>_atom_site.Cartn_z
>>...
>>_atom_site.auth_seq_id
>>_atom_site.auth_comp_id
>>_atom_site.auth_asym_id
>>_atom_site.auth_atom_id
>>_atom_site.pdbx_PDB_model_num
>>
>>ATOM 1     N N     . LEU A 1 1  ... 3  LEU A N    1
>>ATOM 2     C CA    . LEU A 1 1  ... 3  LEU A CA   1
>>ATOM 3     C C     . LEU A 1 1  ... 3  LEU A C    1
>>...
>>ATOM 13    H HB2   . LEU A 1 1  ... 3  LEU A 1HB  1
>>ATOM 14    H HB3   . LEU A 1 1  ... 3  LEU A 2HB  1
>>
>>
>>
>>So that translation is a snap, I think.
>>    
>>
>
>Glad it looks easy to you :-)
>  
>
Well, my point is, both names are there. You wouldn't
have to use any  algorithm at all.

>I also assumed that it would be best if the students
of the future used the standard labelling instead of
the .pdb * format. So I was hoping to start using the '
names within Jmol.
>
>I now fear that this is too much for me to bite off at
this time ... I am going to let it rest for a while.
>
>  
>
Actually, if that is ALL you want to do, don't despair.
As shown above,  the mmCIF files give both numbering
schemes. I'm sure this practice will  persist well into
the future. They really need that double-check, I 
suspect. The worry I would have is that authors will
themselves start  using the IUPAC naming, in which case
the AUTH names might start being  the same as IUPAC or
omitted entirely. Remember, this is JUST FOR THE  Hs.
How much do you care about them? The other atoms just
get a simple  algorithm involving some replacements and
some rearranging, but those  are ONLY for A C G T U I .

Not so bad!

Bob


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=379133&aid=975793&group_id=23629


-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
Jmol-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jmol-developers

[Jmol-developers] [ jmol-Bugs-975793 ] pdb vs cif nomenclature

Reply via email to