Geoff, (Christian, please read too and comment on the MM2 and MMFF94 atom types in the CDK... thanx!)
On Wednesday 03 January 2007 01:42, Geoffrey Hutchison wrote: > > the atom type list from OB has been on BODR SVN for some time now, > > and I want to make a start with converting this to CML so that I can add > > it to the BODR releases. > > Great! The files themselves typically have a few comment lines at the > top to explain their nature. If these don't help, I've provided more > below. OK, thanx. > (And if the comment lines aren't a good explanation, please > let me know so I can change them -- I've added these to help document > the files.) Need to write on my thesis today, but will look at it asap, and report unclear bits. > > Could you please explain the four txt files in BODR SVN, i.e. > > aromatic.txt, > > atomtyp.txt, bondtyp.txt and types.txt and how they relate? > > ... > > > types.txt seems to indicate how atom types are interconverted, while > > atomtyp.txt seems to have the SMARTS queries to perceive atom types. > > * Types.txt is a basic lookup table for converting between different > atom types (e.g., Sybyl to PC Model). I think this is the first, and > best, example needing standardization. For example, we haven't yet > coded MM3 or MMFF94 atom types in Open Babel, but these are in CDK, > right? Indeed we have MM2 (not MM3) and MMFF94. But not sure how complete those lists are. (cc: Christian Hoppe) > The remainder include SMARTS patterns, and it's worth mentioning that > currently Open Babel can involve atom hybridization in SMARTS (e.g., > [#6D2^1] => sp-hybridized carbon with 2 connections). Good. > * Bondtype.txt is used in bond order assignment. It consists of a > SMARTS pattern, followed by a set of triples (index1, index2, order) > where index1 and index2 are atom indexes into the SMARTS. Assignment > is done first to last, so specific matches should be done earlier. > For example allene: > [#6^2][#6D2^1][#6^2] 0 1 2 1 2 2 > # bond between 0 and 1 becomes double bond, 1 and 2 becomes double bond Interesting approach. Much more rule based than what CDK has, but that one fails in certain conditions... > * Aromatic.txt is used in aromaticity detection. It consists of a > SMARTS pattern, followed by a minimum and maximum number of pi > electrons on that atom matching the SMARTS. The last pattern in the > SMARTS is chosen, so patterns typically go from more general to more > specific. For example, nitrogen: > [#7rD2] 1 2 > # nitrogen in a ring with two connections can have 1 or 2 pi electrons Ack. > * Atomtyp.txt has three sets of SMARTS for: > - implicit hybridization (sp, sp2, sp3...) > - implicit valence (how many bonds should an atom have) > - external types (determined by other programs) > > > About the atom type perception: what input does that require? In > > the CDK we > > have several kinds of input, with combinations of: > > - no explicit hydrogens > > - explicit hybridization > > - missing bond orders > > > > (Most notably, atom type perception in SMILES is tricky, where > > hydrogens are > > implicitly assumed (with unknown rules), missing bond orders, but with > > explicit hybridization states.) > > Atom type perception is tricky. Period. Open Babel has the same sort > of problems (indeed, I think everything in chemistry does). Depending > on the source of the atoms, lazy perception in Open Babel does > various sorts of work. For example, coming from Sybyl Mol2, atom > types are assumed to be correct(?!). > > I think a better question is perhaps to discuss how to generally deal > with implicit valence. I think there was some discussion a while ago > about coding an open standard of aromatization. That might be a > useful goal here too. If so, perhaps we should start a wiki page and > I'll get some other folks involved from Open Babel. OK, I will soon start with converting the above tables into XML, in one way or another. I'll start with exploring CML, and keep your setup of files for now. Egon -- [EMAIL PROTECTED] Cologne University Bioinformatics Center (CUBIC) Blog: http://chem-bla-ics.blogspot.com/ GPG: 1024D/D6336BA6 _______________________________________________ Blue-obelisk mailing list Blue-obelisk@hardly.cubic.uni-koeln.de http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk