Hi Rajarshi,

On 9/15/07, Rajarshi Guha <[EMAIL PROTECTED]> wrote:
> I think the issue regarding valency is related to atom typing - Egon
> might be better able to comment on this. One test - make a SMILES
> with explicit H's and see if that works. I think I faced this problem
> with implicit H's

The current state is as follows:

A lot of C,N,S,O,P atom types are perceived. This includes charged and
uncharged versions. Atom typing is quite specific, leading to many
atom types, but keep in mind that these characteristics are currently
defined for each atom type:

- number of neighbors (needed for adding hydrogens)
- hybridization (needed for various things, such as bond order fixing,
aromaticity)
- number of lone pairs (which I hope will be useful for Miguel's MS
fragmentation work)
- number of pi-bonds

The number of pi bonds works as follows (SMILES examples in brackets):
carbonyl (C=O) C and O have one pi bond. Acetonitril (CC#N) C and N
have 2 pi-bonds. Something like C=C=C the center C has two pi-bonds
too.

I am not confident if these characterizations are a complete,
orthogonal set of atom type features, but will learn along the way.

The commit I will make in a second, tests the code against all
relevant MDL molfiles in src/data/mdl/, and currently 2207 out of 2212
atom types are perceived. That's not bad at all, but does not test yet
if perception is correct. Dedicated tests in CDKAtomTypeMatcherTest do
that part at this moment, but on the train to Germany today, I will
extend the MDL test to see of the found atom type characterization is
compatible with what found in the MDL molfile.

Anyway, it covers a lot of weird chemistry already, and already
started to set up a new HydrogenAdder based on the new atom typing
code, and a few JUnit tests too. More work for on the train ride that
is.

Atom type perception is at the center of any good chemoinformatics
toolkit, and the new list will be a super list of the various lists we
have already. That is, it will not be dedicated to a force field, to a
structure generator, to the hydrogen adder... So far, so good, and in
not so long I will explain how the CDK community can stress test the
new implementation.

If you want to start helping already, you certainly can. Please
contact me on IRC, because close coordination is very important at
this stage.

Egon

-- 
----
http://chem-bla-ics.blogspot.com/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to