Hello,

I had a theoretical question I thought the RDkit team might have some
insight on.

It's well known that the graph-isomorphism problem is NP, which means that
for some examples run-time will be worse than polynomial using known
algorithms. This problem is connected to molecule canonization. By
comparing canonical SMILES strings, we very efficiently determine whether
or not two molecular graphs are isomorphic or not.

That means canonizing SMILES graphs is either:

1. potentially very costly in some cases.
   AND/OR
2. fails to produce a unique string for particular graph structures.

So here is my question. What are the cases that are very difficult to
canonize a graph? By extension, are any particular classes of real
molecules difficult or impossible to canonize? If not, is that because
there are some restrictions on molecule graphs that can guarantee we avoid
the difficult cases (e.g. limited branching factor, or that molecules are
nearly planar?)?

Thanks for considering my question.

*S. Joshua Swamidass M.D. Ph.D.*
http://swami.wustl.edu/

Associate Professor, Laboratory and Genomic Medicine
Associate Professor, Biomedical Engineering
Washington University in St. Louis

Administrator: Lori Scantlan <*llscant...@wustl.edu <llscant...@wustl.edu>*>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to