Hi all,

  Someone asked me recently about finding the graph edit distance of two small 
(<= 14 atom) fragments.

I figured this was something that could be brute forced. Following SmallWorld's 
example at https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a 
fragment, incrementally delete terminals (except the "*" connection point 
atom), and ring bonds.

For chain bonds, and non-aromatic bonds, it's easy to delete the bond and add 
the correct number of hydrogens to either side.

But, what should I do when I cut an aromatic bond?

For something like the first "co" in "c1cocn1", I want the result to be 
C=CN=CO. That's because the "o" can only be "-O-" in Kekule form.

For something like "c1cnncn1", breaking on the "nn", I think I would like to 
get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a single or a 
double bond, depending on the Kekule representation, as in:

>>> Chem.CanonSmiles("C-1=N-N=C-C=N-1")
'c1cnncn1'
>>> Chem.CanonSmiles("C-1=N.N=C-C=N-1")
'N=CC=NC=N'

>>> Chem.CanonSmiles("C=1-N=N-C=C-N=1")
'c1cnncn1'
>>> Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1")
'NC=CN=CN'

Problem is, I don't know how to figure out if a given aromatic bond must be a 
"-" or "=", or can be both.

(Well, I could brute-force enumerae all 2**n possible aromatic bond 
assignments, then canonicalize, and see if both assignments are possible for a 
given bond.)

As a non-chemist, I also ask if I'm even on a chemically meaningful track.


                                Andrew
                                da...@dalkescientific.com




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to