Dear Egon,
the atom type list from OB has been on BODR SVN for some time now,
and I want
to make a start with converting this to CML so that I can add it to
the BODR
releases.
Great! The files themselves typically have a few comment lines at the
top to explain their nature. If these don't help, I've provided more
below. (And if the comment lines aren't a good explanation, please
let me know so I can change them -- I've added these to help document
the files.)
Could you please explain the four txt files in BODR SVN, i.e.
aromatic.txt,
atomtyp.txt, bondtyp.txt and types.txt and how they relate?
...
types.txt seems to indicate how atom types are interconverted, while
atomtyp.txt seems to have the SMARTS queries to perceive atom types.
* Types.txt is a basic lookup table for converting between different
atom types (e.g., Sybyl to PC Model). I think this is the first, and
best, example needing standardization. For example, we haven't yet
coded MM3 or MMFF94 atom types in Open Babel, but these are in CDK,
right?
The remainder include SMARTS patterns, and it's worth mentioning that
currently Open Babel can involve atom hybridization in SMARTS (e.g.,
[#6D2^1] => sp-hybridized carbon with 2 connections).
* Bondtype.txt is used in bond order assignment. It consists of a
SMARTS pattern, followed by a set of triples (index1, index2, order)
where index1 and index2 are atom indexes into the SMARTS. Assignment
is done first to last, so specific matches should be done earlier.
For example allene:
[#6^2][#6D2^1][#6^2] 0 1 2 1 2 2
# bond between 0 and 1 becomes double bond, 1 and 2 becomes double bond
* Aromatic.txt is used in aromaticity detection. It consists of a
SMARTS pattern, followed by a minimum and maximum number of pi
electrons on that atom matching the SMARTS. The last pattern in the
SMARTS is chosen, so patterns typically go from more general to more
specific. For example, nitrogen:
[#7rD2] 1 2
# nitrogen in a ring with two connections can have 1 or 2 pi electrons
* Atomtyp.txt has three sets of SMARTS for:
- implicit hybridization (sp, sp2, sp3...)
- implicit valence (how many bonds should an atom have)
- external types (determined by other programs)
About the atom type perception: what input does that require? In
the CDK we
have several kinds of input, with combinations of:
- no explicit hydrogens
- explicit hybridization
- missing bond orders
(Most notably, atom type perception in SMILES is tricky, where
hydrogens are
implicitly assumed (with unknown rules), missing bond orders, but with
explicit hybridization states.)
Atom type perception is tricky. Period. Open Babel has the same sort
of problems (indeed, I think everything in chemistry does). Depending
on the source of the atoms, lazy perception in Open Babel does
various sorts of work. For example, coming from Sybyl Mol2, atom
types are assumed to be correct(?!).
I think a better question is perhaps to discuss how to generally deal
with implicit valence. I think there was some discussion a while ago
about coding an open standard of aromatization. That might be a
useful goal here too. If so, perhaps we should start a wiki page and
I'll get some other folks involved from Open Babel.
Cheers,
-Geoff
_______________________________________________
Blue-obelisk mailing list
Blue-obelisk@hardly.cubic.uni-koeln.de
http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk