Re: [Jmol-developers] * in .pdb <-> ' in .cif

Bob Hanson Thu, 17 Jun 2004 05:44:10 -0700

OK, really it's simple, but you need to know a little chemistry if you want it to make any sense. What is going on is that some amino acid sidechains have carbons with only one H, some with two, and some with three.

For the carbons with one H, there is no issue--they are simply referred to as, say, HB or HD.

For carbons with three Hs, there is also no issue--they are simply referred to as, say, HD1, HD2, HD3. (Old method: 1HD, 2HD, 3HD.) Or, if the carbon had a number already, then the old 1HD1, 2HD1 becomes HD11, HD12.

Same goes for NH2 sidechains (asn, arg, gln). No problem here--same numbering.

So far, no problems, right? But when a carbon bears two Hs, the nomenclature has changed. Whereas before, the numbering was a HYDROGEN count, HB1, HB2, now it is a SUBSTITUENT count, HB2, HB3, with "1" missing, because that would be "reserved" for the rest of the sidechain, which gets its numbering a different way.

What they've implemented in pdb-extract is a relatively simple look-up table that implements both the renumbering and the repositioning of numbers at the same time.

If you handle the repositioning yourself, the look-up table is simpler:

CHANGE_H_NUM CH2_ALA = {}
CHANGE_H_NUM CH2_ARG = {"HB", "HG"}
CHANGE_H_NUM CH2_ASP = {"HB"}
CHANGE_H_NUM CH2_ASN = {"HB"}
CHANGE_H_NUM CH2_CYS = {"HB"}
CHANGE_H_NUM CH2_GLN = {"HB", "HG"}
CHANGE_H_NUM CH2_GLU = {"HB", "HG"}
CHANGE_H_NUM CH2_GLY = {"HA"}
CHANGE_H_NUM CH2_HIS = {"HB"}
CHANGE_H_NUM CH2_ILE = {"HG1"}
CHANGE_H_NUM CH2_LEU = {"HB"}
CHANGE_H_NUM CH2_LYS = {"HB", "HD", "HE", "HG"}
CHANGE_H_NUM CH2_MET = {"HB", "HG"}
CHANGE_H_NUM CH2_PHE = {"HB"}
CHANGE_H_NUM CH2_PRO = {"HB", "HD", "HG", "H"}
CHANGE_H_NUM CH2_SER = {"HB"}
CHANGE_H_NUM CH2_THR = {}
CHANGE_H_NUM CH2_TRP = {"HB"}
CHANGE_H_NUM CH2_TYR = {"HB"}
CHANGE_H_NUM CH2_VAL = {}

(Note that "HT" will also become just "H" in PRO. I have to admit, I don't quite understand what is going on with proline. Do "HT1" "HT2" refer to the hydrogens on the N? If so, then this does make sense, because there is a carbon on this N, so the H numbering by IUPAC should start with 2.) Here's the pdb-extract code:

CHANGE_ATOM_NAME_23_ONLY IU_PRO = {
       8,
        { { "HT1",  "H2"   },
          { "HT2",  "H3"   },
          { "1HB",  "HB2"  },
          { "2HB",  "HB3"  },
          { "1HG",  "HG2"  },
          { "2HG",  "HG3"  },
          { "1HD",  "HD2"  },
          { "2HD",  "HD3"  }

--------------------

A,C,T,G,I:

        OnP becomes OPn
        2HO* becomes HO2'
        1X* becomes X'
        2X* becomes X''
        "5M" becomes "7"
        nXm becomes Xmn
        * becomes '

in that order.

That makes sense.


add "U" to that list. It's: A C T G U I

Then, in filterlib-v8\include\_atom_change_global.h we have seven fundamental models which are mapped to more intresting groups such as DAR, DAS, DCY.

Based upon my relatively short experience working with this data, if they are using group names then they are doomed to failure.

now, now. Every group must have a relatively arbitrary assignment of atom names. That's what updates are for. For all the "standard" monomers, there is no real problem, and the list of "multiple atom" changes is not that long:

static CHANGE_POOLS multiple_atoms[MULTIPLE_ATOMS] = {
      "A",   &M_N4,
      "C",   &M_N4,
      "G",   &M_N4,
      "T",   &M_N4,
      "U",   &M_N4

(NO "I" HERE? maybe a mistake.)

      "GLY", &M_02,

      "ASN", &M_04,
      "ASP", &M_04,
      "CYS", &M_04,
      "DAS", &M_04,
      "DCY", &M_04,
      "DLE", &M_04,
      "DPN", &M_04,
      "DSN", &M_04,
      "DSP", &M_04,
      "DTR", &M_04,
      "DTY", &M_04,
      "HIS", &M_04,
      "LEU", &M_04,
      "PHE", &M_04,
      "SER", &M_04,
      "TRP", &M_04,
      "TYR", &M_04,

      "DIL", &M_06,
      "ILE", &M_06,

      "DGL", &M_08,
      "DGN", &M_08,
      "GLN", &M_08,
      "GLU", &M_08,
      "MET", &M_08,

      "ARG", &M_12,
      "DAR", &M_12,
      "DPR", &M_12,
      "PRO", &M_12,

      "DLY", &M_16,
      "LYS", &M_16,
};

OK, maybe it's a bit long. I'm presuming you have these .h files, right? If not, get the pdb-extract source and take a look. The above wouldn't make any sense without them. Take, for example, model#4:

static CHANGE_ATOM_NAME M_04 = {
       4,
        { { "HB2", "1HB" },
          { "HB3", "2HB" },
          { "2HB", "1HB" },
          { "3HB", "2HB" }
        }
};

This looks to me to be showing how two different presumed mmCIF formats might both be returned to the old PDB format. That could certainly be a complication reading CIF files. But again, it's just in the files with H atoms--usually NMR files.

I see from mmCIF output from the RCSB that BOTH the IUPAC and PDB names
("auth")
are there. In actuality, it would appear that different authors have
different
conventions in PDB names (and maybe even CIF names), but from the above it
also
appears that even CIF name formats have evolved or are somewhat
nonstandard.

_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
...
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num

ATOM 1     N N     . LEU A 1 1  ... 3  LEU A N    1
ATOM 2     C CA    . LEU A 1 1  ... 3  LEU A CA   1
ATOM 3     C C     . LEU A 1 1  ... 3  LEU A C    1
...
ATOM 13    H HB2   . LEU A 1 1  ... 3  LEU A 1HB  1
ATOM 14    H HB3   . LEU A 1 1  ... 3  LEU A 2HB  1

So that translation is a snap, I think.

Glad it looks easy to you :-)

Well, my point is, both names are there. You wouldn't have to use any algorithm at all.

I also assumed that it would be best if the students of the future used
the standard labelling instead of the .pdb * format. So I was hoping to
start using the ' names within Jmol.

I now fear that this is too much for me to bite off at this time ... I am
going to let it rest for a while.

Actually, if that is ALL you want to do, don't despair. As shown above, the mmCIF files give both numbering schemes. I'm sure this practice will persist well into the future. They really need that double-check, I suspect. The worry I would have is that authors will themselves start using the IUPAC naming, in which case the AUTH names might start being the same as IUPAC or omitted entirely. Remember, this is JUST FOR THE Hs. How much do you care about them? The other atoms just get a simple algorithm involving some replacements and some rearranging, but those are ONLY for A C G T U I .

Not so bad!

Bob

-- Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107 Professor of Chemistry, St. Olaf College 1520 St. Olaf Ave., Northfield, MN 55057 mailto:[EMAIL PROTECTED] http://www.stolaf.edu/people/hansonr


-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
Jmol-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jmol-developers

Re: [Jmol-developers] * in .pdb <-> ' in .cif

Reply via email to