Hi all, a few more SMILES bugs are reported, and these mostly are caused by the same bug; or missing feature in CDK, really... Let me explain.
1. SMILES with explicit bond orders can be parsed correctly in CDK. 2. SMILES without explicit bond orders and sp2 hybridized atoms tend to fail. Ad. 1: The SMILES parser is really good. It's not based on EBFN or something (e.g. using JavaCC), but works OK. Ad. 2: SMILES allow one to mark atoms as sp2 hybridized, at least for a subset of atoms, the 'organic subset' it is called, IIRC. This is often combined without explicit bond orders, e.g. 'c1ccccc1'. That's where the trouble comes from: the SMILES parser is expected to to figure out where to put the bonds, and this is *not* trivial, I repeat, this is *not* trivial. I have written at least three implementations (extended of what we had earlier, a from-scratch breadth-first, and a from-scratch depth-first algorithm). Even further complicating this is atom type perception, which is working for most organic compounds, but no full implementation. If one approaches the saturate is a general problem, without inserting code for special cases, you also need to consider alternative atom types. Now, the saturation problem is a long standing problem, and I will try to solve this before summer. If you want to be on the save side: don't use implicit information! Have explicit hydrogens and explicit bond orders! SMILES is abundantly around us, but not the final answer. Maybe someone can develop a open-source line notation language that does not have any implicit information... oh wait, isn't that the InChI?? (apologies for the grumpy comments on the end...) Egon -- [EMAIL PROTECTED] PhD student on Molecular Representation in Chemometrics Radboud University Nijmegen Blog: http://chem-bla-ics.blogspot.com/ http://www.cac.science.ru.nl/people/egonw/ GPG: 1024D/D6336BA6 ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

