Geoff,

(Christian, please read too and comment on the MM2 and MMFF94 atom types in 
the CDK... thanx!)

On Wednesday 03 January 2007 01:42, Geoffrey Hutchison wrote:
> > the atom type list from OB has been on BODR SVN for some time now,
> > and I want to make a start with converting this to CML so that I can add
> > it to the BODR releases.
>
> Great! The files themselves typically have a few comment lines at the
> top to explain their nature. If these don't help, I've provided more
> below. 

OK, thanx.

> (And if the comment lines aren't a good explanation, please 
> let me know so I can change them -- I've added these to help document
> the files.)

Need to write on my thesis today, but will look at it asap, and report unclear 
bits.

> > Could you please explain the four txt files in BODR SVN, i.e.
> > aromatic.txt,
> > atomtyp.txt, bondtyp.txt and types.txt and how they relate?
>
> ...
>
> > types.txt seems to indicate how atom types are interconverted, while
> > atomtyp.txt seems to have the SMARTS queries to perceive atom types.
>
> * Types.txt is a basic lookup table for converting between different
> atom types (e.g., Sybyl to PC Model). I think this is the first, and
> best, example needing standardization. For example, we haven't yet
> coded MM3 or MMFF94 atom types in Open Babel, but these are in CDK,
> right?

Indeed we have MM2 (not MM3) and MMFF94. But not sure how complete those lists 
are. (cc: Christian Hoppe)

> The remainder include SMARTS patterns, and it's worth mentioning that
> currently Open Babel can involve atom hybridization in SMARTS (e.g.,
> [#6D2^1] => sp-hybridized carbon with 2 connections).

Good.

> * Bondtype.txt is used in bond order assignment. It consists of a
> SMARTS pattern, followed by a set of triples (index1, index2, order)
> where index1 and index2 are atom indexes into the SMARTS. Assignment
> is done first to last, so specific matches should be done earlier.
> For example allene:
> [#6^2][#6D2^1][#6^2]            0 1 2 1 2 2
> # bond between 0 and 1 becomes double bond, 1 and 2 becomes double bond

Interesting approach. Much more rule based than what CDK has, but that one 
fails in certain conditions...

> * Aromatic.txt is used in aromaticity detection. It consists of a
> SMARTS pattern, followed by a minimum and maximum number of pi
> electrons on that atom matching the SMARTS. The last pattern in the
> SMARTS is chosen, so patterns typically go from more general to more
> specific. For example, nitrogen:
> [#7rD2]                 1       2
> # nitrogen in a ring with two connections can have 1 or 2 pi electrons

Ack.

> * Atomtyp.txt has three sets of SMARTS for:
> - implicit hybridization (sp, sp2, sp3...)
> - implicit valence (how many bonds should an atom have)
> - external types (determined by other programs)
>
> > About the atom type perception: what input does that require? In
> > the CDK we
> > have several kinds of input, with combinations of:
> > - no explicit hydrogens
> > - explicit hybridization
> > - missing bond orders
> >
> > (Most notably, atom type perception in SMILES is tricky, where
> > hydrogens are
> > implicitly assumed (with unknown rules), missing bond orders, but with
> > explicit hybridization states.)
>
> Atom type perception is tricky. Period. Open Babel has the same sort
> of problems (indeed, I think everything in chemistry does). Depending
> on the source of the atoms, lazy perception in Open Babel does
> various sorts of work. For example, coming from Sybyl Mol2, atom
> types are assumed to be correct(?!).
>
> I think a better question is perhaps to discuss how to generally deal
> with implicit valence. I think there was some discussion a while ago
> about coding an open standard of aromatization. That might be a
> useful goal here too. If so, perhaps we should start a wiki page and
> I'll get some other folks involved from Open Babel.

OK, I will soon start with converting the above tables into XML, in one way or 
another. I'll start with exploring CML, and keep your setup of files for now.

Egon

-- 
[EMAIL PROTECTED]
Cologne University Bioinformatics Center (CUBIC)
Blog: http://chem-bla-ics.blogspot.com/
GPG: 1024D/D6336BA6
_______________________________________________
Blue-obelisk mailing list
Blue-obelisk@hardly.cubic.uni-koeln.de
http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk

Reply via email to