For the carbons with one H, there is no issue--they are simply referred to as, say, HB or HD.
For carbons with three Hs, there is also no issue--they are simply referred to as, say, HD1, HD2, HD3. (Old method: 1HD, 2HD, 3HD.) Or, if the carbon had a number already, then the old 1HD1, 2HD1 becomes HD11, HD12.
Same goes for NH2 sidechains (asn, arg, gln). No problem here--same numbering.
So far, no problems, right? But when a carbon bears two Hs, the nomenclature has changed. Whereas before, the numbering was a HYDROGEN count, HB1, HB2, now it is a SUBSTITUENT count, HB2, HB3, with "1" missing, because that would be "reserved" for the rest of the sidechain, which gets its numbering a different way.
What they've implemented in pdb-extract is a relatively simple look-up table that implements both the renumbering and the repositioning of numbers at the same time.
If you handle the repositioning yourself, the look-up table is simpler:
CHANGE_H_NUM CH2_ALA = {}
CHANGE_H_NUM CH2_ARG = {"HB", "HG"}
CHANGE_H_NUM CH2_ASP = {"HB"}
CHANGE_H_NUM CH2_ASN = {"HB"}
CHANGE_H_NUM CH2_CYS = {"HB"}
CHANGE_H_NUM CH2_GLN = {"HB", "HG"}
CHANGE_H_NUM CH2_GLU = {"HB", "HG"}
CHANGE_H_NUM CH2_GLY = {"HA"}
CHANGE_H_NUM CH2_HIS = {"HB"}
CHANGE_H_NUM CH2_ILE = {"HG1"}
CHANGE_H_NUM CH2_LEU = {"HB"}
CHANGE_H_NUM CH2_LYS = {"HB", "HD", "HE", "HG"}
CHANGE_H_NUM CH2_MET = {"HB", "HG"}
CHANGE_H_NUM CH2_PHE = {"HB"}
CHANGE_H_NUM CH2_PRO = {"HB", "HD", "HG", "H"}
CHANGE_H_NUM CH2_SER = {"HB"}
CHANGE_H_NUM CH2_THR = {}
CHANGE_H_NUM CH2_TRP = {"HB"}
CHANGE_H_NUM CH2_TYR = {"HB"}
CHANGE_H_NUM CH2_VAL = {}(Note that "HT" will also become just "H" in PRO. I have to admit, I don't quite understand what is going on with proline. Do "HT1" "HT2" refer to the hydrogens on the N? If so, then this does make sense, because there is a carbon on this N, so the H numbering by IUPAC should start with 2.) Here's the pdb-extract code:
CHANGE_ATOM_NAME_23_ONLY IU_PRO = {
8,
{ { "HT1", "H2" },
{ "HT2", "H3" },
{ "1HB", "HB2" },
{ "2HB", "HB3" },
{ "1HG", "HG2" },
{ "2HG", "HG3" },
{ "1HD", "HD2" },
{ "2HD", "HD3" }--------------------
A,C,T,G,I:
OnP becomes OPn 2HO* becomes HO2' 1X* becomes X' 2X* becomes X'' "5M" becomes "7" nXm becomes Xmn * becomes '
in that order.
That makes sense.
add "U" to that list. It's: A C T G U I
now, now. Every group must have a relatively arbitrary assignment of atom names. That's what updates are for. For all the "standard" monomers, there is no real problem, and the list of "multiple atom" changes is not that long:
Then, in filterlib-v8\include\_atom_change_global.h we have seven
fundamental
models which are mapped to more intresting groups such as DAR, DAS, DCY.
Based upon my relatively short experience working with this data, if they
are using group names then they are doomed to failure.
static CHANGE_POOLS multiple_atoms[MULTIPLE_ATOMS] = {
"A", &M_N4,
"C", &M_N4,
"G", &M_N4,
"T", &M_N4,
"U", &M_N4(NO "I" HERE? maybe a mistake.)
"GLY", &M_02,
"ASN", &M_04,
"ASP", &M_04,
"CYS", &M_04,
"DAS", &M_04,
"DCY", &M_04,
"DLE", &M_04,
"DPN", &M_04,
"DSN", &M_04,
"DSP", &M_04,
"DTR", &M_04,
"DTY", &M_04,
"HIS", &M_04,
"LEU", &M_04,
"PHE", &M_04,
"SER", &M_04,
"TRP", &M_04,
"TYR", &M_04, "DIL", &M_06,
"ILE", &M_06, "DGL", &M_08,
"DGN", &M_08,
"GLN", &M_08,
"GLU", &M_08,
"MET", &M_08, "ARG", &M_12,
"DAR", &M_12,
"DPR", &M_12,
"PRO", &M_12, "DLY", &M_16,
"LYS", &M_16,
};OK, maybe it's a bit long. I'm presuming you have these .h files, right? If not, get the pdb-extract source and take a look. The above wouldn't make any sense without them. Take, for example, model#4:
static CHANGE_ATOM_NAME M_04 = {
4,
{ { "HB2", "1HB" },
{ "HB3", "2HB" },
{ "2HB", "1HB" },
{ "3HB", "2HB" }
}
};This looks to me to be showing how two different presumed mmCIF formats might both be returned to the old PDB format. That could certainly be a complication reading CIF files. But again, it's just in the files with H atoms--usually NMR files.
Well, my point is, both names are there. You wouldn't have to use any algorithm at all.
I see from mmCIF output from the RCSB that BOTH the IUPAC and PDB names ("auth") are there. In actuality, it would appear that different authors have different conventions in PDB names (and maybe even CIF names), but from the above it also appears that even CIF name formats have evolved or are somewhat nonstandard.
_atom_site.group_PDB _atom_site.id _atom_site.type_symbol _atom_site.label_atom_id _atom_site.label_alt_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.label_entity_id _atom_site.label_seq_id _atom_site.pdbx_PDB_ins_code _atom_site.Cartn_x _atom_site.Cartn_y _atom_site.Cartn_z ... _atom_site.auth_seq_id _atom_site.auth_comp_id _atom_site.auth_asym_id _atom_site.auth_atom_id _atom_site.pdbx_PDB_model_num
ATOM 1 N N . LEU A 1 1 ... 3 LEU A N 1 ATOM 2 C CA . LEU A 1 1 ... 3 LEU A CA 1 ATOM 3 C C . LEU A 1 1 ... 3 LEU A C 1 ... ATOM 13 H HB2 . LEU A 1 1 ... 3 LEU A 1HB 1 ATOM 14 H HB3 . LEU A 1 1 ... 3 LEU A 2HB 1
So that translation is a snap, I think.
Glad it looks easy to you :-)
Actually, if that is ALL you want to do, don't despair. As shown above, the mmCIF files give both numbering schemes. I'm sure this practice will persist well into the future. They really need that double-check, I suspect. The worry I would have is that authors will themselves start using the IUPAC naming, in which case the AUTH names might start being the same as IUPAC or omitted entirely. Remember, this is JUST FOR THE Hs. How much do you care about them? The other atoms just get a simple algorithm involving some replacements and some rearranging, but those are ONLY for A C G T U I .I also assumed that it would be best if the students of the future used the standard labelling instead of the .pdb * format. So I was hoping to start using the ' names within Jmol.
I now fear that this is too much for me to bite off at this time ... I am going to let it rest for a while.
Not so bad!
Bob
-- Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107 Professor of Chemistry, St. Olaf College 1520 St. Olaf Ave., Northfield, MN 55057 mailto:[EMAIL PROTECTED] http://www.stolaf.edu/people/hansonr
------------------------------------------------------- This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND _______________________________________________ Jmol-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/jmol-developers
