Hi Rajarshi, On 9/15/07, Rajarshi Guha <[EMAIL PROTECTED]> wrote: > I think the issue regarding valency is related to atom typing - Egon > might be better able to comment on this. One test - make a SMILES > with explicit H's and see if that works. I think I faced this problem > with implicit H's
The current state is as follows: A lot of C,N,S,O,P atom types are perceived. This includes charged and uncharged versions. Atom typing is quite specific, leading to many atom types, but keep in mind that these characteristics are currently defined for each atom type: - number of neighbors (needed for adding hydrogens) - hybridization (needed for various things, such as bond order fixing, aromaticity) - number of lone pairs (which I hope will be useful for Miguel's MS fragmentation work) - number of pi-bonds The number of pi bonds works as follows (SMILES examples in brackets): carbonyl (C=O) C and O have one pi bond. Acetonitril (CC#N) C and N have 2 pi-bonds. Something like C=C=C the center C has two pi-bonds too. I am not confident if these characterizations are a complete, orthogonal set of atom type features, but will learn along the way. The commit I will make in a second, tests the code against all relevant MDL molfiles in src/data/mdl/, and currently 2207 out of 2212 atom types are perceived. That's not bad at all, but does not test yet if perception is correct. Dedicated tests in CDKAtomTypeMatcherTest do that part at this moment, but on the train to Germany today, I will extend the MDL test to see of the found atom type characterization is compatible with what found in the MDL molfile. Anyway, it covers a lot of weird chemistry already, and already started to set up a new HydrogenAdder based on the new atom typing code, and a few JUnit tests too. More work for on the train ride that is. Atom type perception is at the center of any good chemoinformatics toolkit, and the new list will be a super list of the various lists we have already. That is, it will not be dedicated to a force field, to a structure generator, to the hydrogen adder... So far, so good, and in not so long I will explain how the CDK community can stress test the new implementation. If you want to start helping already, you certainly can. Please contact me on IRC, because close coordination is very important at this stage. Egon -- ---- http://chem-bla-ics.blogspot.com/ ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

