Dear Egon,

the atom type list from OB has been on BODR SVN for some time now, and I want to make a start with converting this to CML so that I can add it to the BODR
releases.

Great! The files themselves typically have a few comment lines at the top to explain their nature. If these don't help, I've provided more below. (And if the comment lines aren't a good explanation, please let me know so I can change them -- I've added these to help document the files.)

Could you please explain the four txt files in BODR SVN, i.e. aromatic.txt,
atomtyp.txt, bondtyp.txt and types.txt and how they relate?
...
types.txt seems to indicate how atom types are interconverted, while
atomtyp.txt seems to have the SMARTS queries to perceive atom types.

* Types.txt is a basic lookup table for converting between different atom types (e.g., Sybyl to PC Model). I think this is the first, and best, example needing standardization. For example, we haven't yet coded MM3 or MMFF94 atom types in Open Babel, but these are in CDK, right?

The remainder include SMARTS patterns, and it's worth mentioning that currently Open Babel can involve atom hybridization in SMARTS (e.g., [#6D2^1] => sp-hybridized carbon with 2 connections).

* Bondtype.txt is used in bond order assignment. It consists of a SMARTS pattern, followed by a set of triples (index1, index2, order) where index1 and index2 are atom indexes into the SMARTS. Assignment is done first to last, so specific matches should be done earlier. For example allene:
[#6^2][#6D2^1][#6^2]            0 1 2 1 2 2
# bond between 0 and 1 becomes double bond, 1 and 2 becomes double bond

* Aromatic.txt is used in aromaticity detection. It consists of a SMARTS pattern, followed by a minimum and maximum number of pi electrons on that atom matching the SMARTS. The last pattern in the SMARTS is chosen, so patterns typically go from more general to more specific. For example, nitrogen:
[#7rD2]                 1       2
# nitrogen in a ring with two connections can have 1 or 2 pi electrons

* Atomtyp.txt has three sets of SMARTS for:
- implicit hybridization (sp, sp2, sp3...)
- implicit valence (how many bonds should an atom have)
- external types (determined by other programs)

About the atom type perception: what input does that require? In the CDK we
have several kinds of input, with combinations of:
- no explicit hydrogens
- explicit hybridization
- missing bond orders

(Most notably, atom type perception in SMILES is tricky, where hydrogens are
implicitly assumed (with unknown rules), missing bond orders, but with
explicit hybridization states.)

Atom type perception is tricky. Period. Open Babel has the same sort of problems (indeed, I think everything in chemistry does). Depending on the source of the atoms, lazy perception in Open Babel does various sorts of work. For example, coming from Sybyl Mol2, atom types are assumed to be correct(?!).

I think a better question is perhaps to discuss how to generally deal with implicit valence. I think there was some discussion a while ago about coding an open standard of aromatization. That might be a useful goal here too. If so, perhaps we should start a wiki page and I'll get some other folks involved from Open Babel.

Cheers,
-Geoff
_______________________________________________
Blue-obelisk mailing list
Blue-obelisk@hardly.cubic.uni-koeln.de
http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk

Reply via email to