Bugs item #975793, was opened at 2004-06-19 04:48 Message generated for change (Comment added) made by hansonr You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=379133&aid=975793&group_id=23629
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Algorithms Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Miguel (migueljmol) >Assigned to: Bob Hanson (hansonr) Summary: pdb vs cif nomenclature Initial Comment: many emails were exchanged on this topic. two are below. ---------------------------- Original Message ---------------------------- Subject: mmCIF/PDB From: "Bob Hanson" <[EMAIL PROTECTED]> Date: Thu, June 17, 2004 18:03 To: "Miguel" <[EMAIL PROTECTED]> -------------------------------------------------------------------------- Miguel, I asked the help desk at Rutgers. Here was the response. Really, I think you will be OK with the mmCIF files just using the _auth_ fields: See answers below. Zukang Bob Hanson wrote: > Can you tell me a little more about the pdb-extract file > _atom_change_global.h? > > Please tell me if I am correct, and if not, what I have wrong: > > -basically this is for changing mmCIF names BACK to PDB names. Changed author defined atom names back to PDB names. > > > -this is a set of model transformations that is meant to encompass all PDB "ATOM" types. Only for standard amino acids and nucleic acids > > > -this is an n:1 mapping, implying that there are several mmCIF > conventions all mapping to the same PDB name. For example, HB2 and 2HB both map to the PDB name 1HB. Yes. Some refinement program use HB2, HB3 conventions. It will change to 1HB, 2HB. Others use HB1, HB2 conventions. Then the program will change to 1HB, 2HB. > > > -asterisks on the left would really be single quotes in an mmCIF file. Yes. > > > -S_T seems odd. Shouldn't it be H71 to 1H5M, not H51 to 1H5M? It's only work if there are H51, H52 and H53. The program will take care of C7, H71, H72 and H73 situation. > > > One additional question: When mmCIF files are delivered from rcsb, are the "_auth_name" records always the PDB standard names, or are they the "author's" names, which might be anything? Always the PDB standard names. ----------------- Bob Hanson wrote: Thank you very much. Almost got it! See below. Zukang Feng wrote: > See answers below. > > Zukang > > Bob Hanson wrote: > >> Can you tell me a little more about the pdb-extract file >> _atom_change_global.h? >> >> Please tell me if I am correct, and if not, what I have wrong: >> >> -basically this is for changing mmCIF names BACK to PDB names. > > > > > Changed author defined atom names back to PDB names. > > > > One additional question: When mmCIF files are delivered from rcsb, are the "_auth_name" records always the PDB standard names, or are they the "author's" names, which might be anything? Always the PDB standard names. Is the idea that authors submit presumed "mmCIF" files (or otherwise) to rcsb, but really they have to be fixed up before depositing. So the idea is, this atom_change information allows you to first convert the nonstandard names to standard PDB names for placement in the _atom_site.auth_atom_id field, then (later?) convert them to mmCIF for the _atom_site.label_atom_id field. In this way the _atom_site.auth_atom_id always contains the "standard PDB" name? Bob Hanson Yes. It always convert the non-standard names to standard PDB names first. Later we can automatically convert to another standard names in _atom_site.label_atom_id. Currently we put IUPAC standard names in that field. It would be great if the authors fix them up before depositing. But I think our programs can handle all situations of most popular refinement programs. Regards, Zukang Feng >>> >>> >>> Thank you very much, >>> >>> Bob Hanson >>> >>> >> >> > hasta luego, Bob ---------------------------- Mensaje Original ---------------------------- Asunto: Re: [Jmol-developers] * in .pdb <-> ' in .cif De: "Bob Hanson" <[EMAIL PROTECTED]> Fecha: Thu, 17 de Junio de 2004, 14:25 Para: [email protected] -------------------------------------------------------------------------- OK, really it's simple, but you need to know a little chemistry if you want it to make any sense. What is going on is that some amino acid sidechains have carbons with only one H, some with two, and some with three. For the carbons with one H, there is no issue--they are simply referred to as, say, HB or HD. For carbons with three Hs, there is also no issue--they are simply referred to as, say, HD1, HD2, HD3. (Old method: 1HD, 2HD, 3HD.) Or, if the carbon had a number already, then the old 1HD1, 2HD1 becomes HD11, HD12. Same goes for NH2 sidechains (asn, arg, gln). No problem here--same numbering. So far, no problems, right? But when a carbon bears two Hs, the nomenclature has changed. Whereas before, the numbering was a HYDROGEN count, HB1, HB2, now it is a SUBSTITUENT count, HB2, HB3, with "1" missing, because that would be "reserved" for the rest of the sidechain, which gets its numbering a different way. What they've implemented in pdb-extract is a relatively simple look-up table that implements both the renumbering and the repositioning of numbers at the same time. If you handle the repositioning yourself, the look-up table is simpler: CHANGE_H_NUM CH2_ALA = {} CHANGE_H_NUM CH2_ARG = {"HB", "HG"} CHANGE_H_NUM CH2_ASP = {"HB"} CHANGE_H_NUM CH2_ASN = {"HB"} CHANGE_H_NUM CH2_CYS = {"HB"} CHANGE_H_NUM CH2_GLN = {"HB", "HG"} CHANGE_H_NUM CH2_GLU = {"HB", "HG"} CHANGE_H_NUM CH2_GLY = {"HA"} CHANGE_H_NUM CH2_HIS = {"HB"} CHANGE_H_NUM CH2_ILE = {"HG1"} CHANGE_H_NUM CH2_LEU = {"HB"} CHANGE_H_NUM CH2_LYS = {"HB", "HD", "HE", "HG"} CHANGE_H_NUM CH2_MET = {"HB", "HG"} CHANGE_H_NUM CH2_PHE = {"HB"} CHANGE_H_NUM CH2_PRO = {"HB", "HD", "HG", "H"} CHANGE_H_NUM CH2_SER = {"HB"} CHANGE_H_NUM CH2_THR = {} CHANGE_H_NUM CH2_TRP = {"HB"} CHANGE_H_NUM CH2_TYR = {"HB"} CHANGE_H_NUM CH2_VAL = {} (Note that "HT" will also become just "H" in PRO. I have to admit, I don't quite understand what is going on with proline. Do "HT1" "HT2" refer to the hydrogens on the N? If so, then this does make sense, because there is a carbon on this N, so the H numbering by IUPAC should start with 2.) Here's the pdb-extract code: CHANGE_ATOM_NAME_23_ONLY IU_PRO = { 8, { { "HT1", "H2" }, { "HT2", "H3" }, { "1HB", "HB2" }, { "2HB", "HB3" }, { "1HG", "HG2" }, { "2HG", "HG3" }, { "1HD", "HD2" }, { "2HD", "HD3" } -------------------- > > >>A,C,T,G,I: >> >> OnP becomes OPn >> 2HO* becomes HO2' >> 1X* becomes X' >> 2X* becomes X'' >> "5M" becomes "7" >> nXm becomes Xmn >> * becomes ' >> >>in that order. >> >> > >That makes sense. > > add "U" to that list. It's: A C T G U I > > >>Then, in filterlib-v8\include\_atom_change_global.h we have seven fundamental >>models which are mapped to more intresting groups such as DAR, DAS, DCY. >> >> > >Based upon my relatively short experience working with this data, if they are using group names then they are doomed to failure. > > now, now. Every group must have a relatively arbitrary assignment of atom names. That's what updates are for. For all the "standard" monomers, there is no real problem, and the list of "multiple atom" changes is not that long: static CHANGE_POOLS multiple_atoms[MULTIPLE_ATOMS] = { "A", &M_N4, "C", &M_N4, "G", &M_N4, "T", &M_N4, "U", &M_N4 (NO "I" HERE? maybe a mistake.) "GLY", &M_02, "ASN", &M_04, "ASP", &M_04, "CYS", &M_04, "DAS", &M_04, "DCY", &M_04, "DLE", &M_04, "DPN", &M_04, "DSN", &M_04, "DSP", &M_04, "DTR", &M_04, "DTY", &M_04, "HIS", &M_04, "LEU", &M_04, "PHE", &M_04, "SER", &M_04, "TRP", &M_04, "TYR", &M_04, "DIL", &M_06, "ILE", &M_06, "DGL", &M_08, "DGN", &M_08, "GLN", &M_08, "GLU", &M_08, "MET", &M_08, "ARG", &M_12, "DAR", &M_12, "DPR", &M_12, "PRO", &M_12, "DLY", &M_16, "LYS", &M_16, }; OK, maybe it's a bit long. I'm presuming you have these .h files, right? If not, get the pdb-extract source and take a look. The above wouldn't make any sense without them. Take, for example, model#4: static CHANGE_ATOM_NAME M_04 = { 4, { { "HB2", "1HB" }, { "HB3", "2HB" }, { "2HB", "1HB" }, { "3HB", "2HB" } } }; This looks to me to be showing how two different presumed mmCIF formats might both be returned to the old PDB format. That could certainly be a complication reading CIF files. But again, it's just in the files with H atoms--usually NMR files. > > >>I see from mmCIF output from the RCSB that BOTH the IUPAC and PDB names ("auth") >>are there. In actuality, it would appear that different authors have different >>conventions in PDB names (and maybe even CIF names), but from the above it also >>appears that even CIF name formats have evolved or are somewhat >>nonstandard. >> >>_atom_site.group_PDB >>_atom_site.id >>_atom_site.type_symbol >>_atom_site.label_atom_id >>_atom_site.label_alt_id >>_atom_site.label_comp_id >>_atom_site.label_asym_id >>_atom_site.label_entity_id >>_atom_site.label_seq_id >>_atom_site.pdbx_PDB_ins_code >>_atom_site.Cartn_x >>_atom_site.Cartn_y >>_atom_site.Cartn_z >>... >>_atom_site.auth_seq_id >>_atom_site.auth_comp_id >>_atom_site.auth_asym_id >>_atom_site.auth_atom_id >>_atom_site.pdbx_PDB_model_num >> >>ATOM 1 N N . LEU A 1 1 ... 3 LEU A N 1 >>ATOM 2 C CA . LEU A 1 1 ... 3 LEU A CA 1 >>ATOM 3 C C . LEU A 1 1 ... 3 LEU A C 1 >>... >>ATOM 13 H HB2 . LEU A 1 1 ... 3 LEU A 1HB 1 >>ATOM 14 H HB3 . LEU A 1 1 ... 3 LEU A 2HB 1 >> >> >> >>So that translation is a snap, I think. >> >> > >Glad it looks easy to you :-) > > Well, my point is, both names are there. You wouldn't have to use any algorithm at all. >I also assumed that it would be best if the students of the future used the standard labelling instead of the .pdb * format. So I was hoping to start using the ' names within Jmol. > >I now fear that this is too much for me to bite off at this time ... I am going to let it rest for a while. > > > Actually, if that is ALL you want to do, don't despair. As shown above, the mmCIF files give both numbering schemes. I'm sure this practice will persist well into the future. They really need that double-check, I suspect. The worry I would have is that authors will themselves start using the IUPAC naming, in which case the AUTH names might start being the same as IUPAC or omitted entirely. Remember, this is JUST FOR THE Hs. How much do you care about them? The other atoms just get a simple algorithm involving some replacements and some rearranging, but those are ONLY for A C G T U I . Not so bad! Bob ---------------------------------------------------------------------- >Comment By: Bob Hanson (hansonr) Date: 2006-08-22 17:42 Message: Logged In: YES user_id=1082841 this was fixed in Jmol 11 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=379133&aid=975793&group_id=23629 ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Jmol-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/jmol-developers
