Hi all, Someone asked me recently about finding the graph edit distance of two small (<= 14 atom) fragments.
I figured this was something that could be brute forced. Following SmallWorld's example at https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment, incrementally delete terminals (except the "*" connection point atom), and ring bonds. For chain bonds, and non-aromatic bonds, it's easy to delete the bond and add the correct number of hydrogens to either side. But, what should I do when I cut an aromatic bond? For something like the first "co" in "c1cocn1", I want the result to be C=CN=CO. That's because the "o" can only be "-O-" in Kekule form. For something like "c1cnncn1", breaking on the "nn", I think I would like to get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a single or a double bond, depending on the Kekule representation, as in: >>> Chem.CanonSmiles("C-1=N-N=C-C=N-1") 'c1cnncn1' >>> Chem.CanonSmiles("C-1=N.N=C-C=N-1") 'N=CC=NC=N' >>> Chem.CanonSmiles("C=1-N=N-C=C-N=1") 'c1cnncn1' >>> Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1") 'NC=CN=CN' Problem is, I don't know how to figure out if a given aromatic bond must be a "-" or "=", or can be both. (Well, I could brute-force enumerae all 2**n possible aromatic bond assignments, then canonicalize, and see if both assignments are possible for a given bond.) As a non-chemist, I also ask if I'm even on a chemically meaningful track. Andrew da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss