Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-16 Thread Andrew Dalke
On Jun 16, 2023, at 03:15, S Joshua Swamidass  wrote:
> In graph theory, a planar graph is a graph that can be embedded in the plane, 
> i.e., it can be drawn on the plane in such a way that its edges intersect 
> only at their endpoints. In other words, it can be drawn in such a way that 
> no edges cross each other.

Years ago at 
http://www.dalkescientific.com/writings/diary/archive/2012/05/18/nonplanar_compounds.html
 I did a search of a subset of 28.5 million PubChem structures and found 224 
topologically non-planar examples, like 
https://pubchem.ncbi.nlm.nih.gov/compound/50919058 .

I also gave some literature citations, like "Synthesis of the first 
topologically non-planar molecule" (1981) at 
https://www.sciencedirect.com/science/article/abs/pii/0040403981800779 .

> On Jun 15, 2023, at 22:50, S Joshua Swamidass  wrote:
> 
> Have any other libraries adopted your approach? It's clever. 

It isn't my approach.

It depends on which part you consider clever. I've been told some of the ideas 
can be traced back to Bernhard Rohde's PhD thesis, citation 20 in the paper 
("who employed a stable numbering for equivalence classes instead of a 
sequential index"). Rohde is also thanked in the acknowledgements. And Rohde 
has/had his own in-house library at Novartis which included canonicalization.

I wouldn't be surprised if NextMove has their own implementation, given how 
Roger Sayle is a co-author of the paper.


> the covalent bonds of proteins of arbitrary size are a planar graph too, even 
> though most (all?) proteins have a 3D structure.

FWIW, due to disulfide bonds in cystines, proteins can be topologically 
non-planar. https://academic.oup.com/nar/article/47/D1/D367/5223942?login=false 
give PDB entry 1AOC as an example, with their database entry at 
https://knotprot.cent.uw.edu.pl/view/1aoc/A/ .

Best regards,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread S Joshua Swamidass
Planar graphs are...

In graph theory , a *planar
graph* is a graph
 that can be
embedded  in the plane
, i.e., it can be drawn on
the plane in such a way that its edges intersect only at their endpoints.
In other words, it can be drawn in such a way that no edges cross each
other.
https://en.wikipedia.org/wiki/Planar_graph

We aren't talking about whether or not the chemical structure is 2D or 3D.
Molecules representable as planar graphs can certainly be 3D molecules that
do not have planar structures. For example, cyclohexane is not a planar
molecule (vs. benzene for example), but it can trivally be drawn in 2D
without any crossing bonds. Likewise, (if ignore disulfide bonds and other
gotchas), the covalent bonds of proteins of arbitrary size are a planar
graph too, even though most (all?) proteins have a 3D structure.



On Thu, Jun 15, 2023 at 8:05 PM Francois Berenger  wrote:

> On 16/06/2023 03:49, S Joshua Swamidass wrote:
> > Incidentally,
> >
> > I came across this O(log N) canonization algorithm for planar graphs:
> > https://arxiv.org/pdf/0809.2319.pdf
> >
> > I wonder if this algorithm can be adapted for chemistry? Molecules are
> > usually planar, but I believe they occasionally are "nearly" planar,
> > by which I mean planar graphs plus a few edges that break the
> > planarity.
>
> Dear Joshua,
>
> Some natural product are notoriously complex 3D molecules.
>
> What do you exactly mean by planar?
>
> Many chemical groups are 3D: methyl, adamantane, etc.
>
> > And what (generally speaking) is the algorithm used by rdkit? Do we
> > know it's complexity?
> >
> > On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass
> >  wrote:
> >
> >> Andrew,
> >>
> >> Thanks! According to wikipedia (and my recollections of algorithms
> >> class)...
> >> "The problem is not known to be solvable in polynomial time [1] nor
> >> to be NP-complete [2], and therefore may be in the computational
> >> complexity class [3] NP-intermediate [4]."
> >>
> >> https://en.wikipedia.org/wiki/Graph_isomorphism_problem
> >>
> >> Your reference though is really helpful. The key phrase seems to be
> >> "bounded valence" which is certainly true of molecular graphs. Each
> >> atom can only bound some fairly low number of other atoms, i.e.
> >> bounded valence. That's probably the reason why we do have a
> >> polynomial time algorithm...
> >>
> >> Thank you!
> >>
> >> Joshua
> >>
> >> On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke
> >>  wrote:
> >>
> >>> On Jun 15, 2023, at 18:20, S Joshua Swamidass
> >>>  wrote:
>  It's well known that the graph-isomorphism problem is NP
> >>>
> >>> While P is contained in NP, I don't think that's the NP you mean.
> >>>
> >>> I suspect you may be thinking of subgraph isomorphism, which is
> >>> NP-hard. Graph isomorphism may be quasi-polynomial time, if
> >>> Babai's (unpublished) claim is correct.
> >>>
> >>> Also, "Isomorphism of graphs of bounded valence can be tested in
> >>> polynomial time" -
> >>> https://www.sciencedirect.com/science/article/pii/002282900095
> >>> .
> >>>
>  So here is my question. What are the cases that are very
> >>> difficult to canonize a graph?
> >>>
> >>> As I recall, handling chirality and other non-local properties is
> >>> difficult. I have not worked on this problem.
> >>>
> >>> Cheers,
> >>>
> >>> Andrew
> >>> da...@dalkescientific.com
> >
> >
> > Links:
> > --
> > [1] https://en.wikipedia.org/wiki/Polynomial_time
> > [2] https://en.wikipedia.org/wiki/NP-complete
> > [3] https://en.wikipedia.org/wiki/Complexity_class
> > [4] https://en.wikipedia.org/wiki/NP-intermediate
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Francois Berenger

On 16/06/2023 03:49, S Joshua Swamidass wrote:

Incidentally,

I came across this O(log N) canonization algorithm for planar graphs:
https://arxiv.org/pdf/0809.2319.pdf

I wonder if this algorithm can be adapted for chemistry? Molecules are
usually planar, but I believe they occasionally are "nearly" planar,
by which I mean planar graphs plus a few edges that break the
planarity.


Dear Joshua,

Some natural product are notoriously complex 3D molecules.

What do you exactly mean by planar?

Many chemical groups are 3D: methyl, adamantane, etc.


And what (generally speaking) is the algorithm used by rdkit? Do we
know it's complexity?

On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass
 wrote:


Andrew,

Thanks! According to wikipedia (and my recollections of algorithms
class)...
"The problem is not known to be solvable in polynomial time [1] nor
to be NP-complete [2], and therefore may be in the computational
complexity class [3] NP-intermediate [4]."

https://en.wikipedia.org/wiki/Graph_isomorphism_problem

Your reference though is really helpful. The key phrase seems to be
"bounded valence" which is certainly true of molecular graphs. Each
atom can only bound some fairly low number of other atoms, i.e.
bounded valence. That's probably the reason why we do have a
polynomial time algorithm...

Thank you!

Joshua

On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke
 wrote:


On Jun 15, 2023, at 18:20, S Joshua Swamidass
 wrote:

It's well known that the graph-isomorphism problem is NP


While P is contained in NP, I don't think that's the NP you mean.

I suspect you may be thinking of subgraph isomorphism, which is
NP-hard. Graph isomorphism may be quasi-polynomial time, if
Babai's (unpublished) claim is correct.

Also, "Isomorphism of graphs of bounded valence can be tested in
polynomial time" -
https://www.sciencedirect.com/science/article/pii/002282900095
.


So here is my question. What are the cases that are very

difficult to canonize a graph?

As I recall, handling chirality and other non-local properties is
difficult. I have not worked on this problem.

Cheers,

Andrew
da...@dalkescientific.com



Links:
--
[1] https://en.wikipedia.org/wiki/Polynomial_time
[2] https://en.wikipedia.org/wiki/NP-complete
[3] https://en.wikipedia.org/wiki/Complexity_class
[4] https://en.wikipedia.org/wiki/NP-intermediate
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Andrew Dalke
On Jun 15, 2023, at 20:49, S Joshua Swamidass  wrote:
> 
> And what (generally speaking) is the algorithm used by rdkit? Do we know it's 
> complexity?

https://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00543

"Get Your Atoms in Order—An Open-Source Implementation of a Novel and Robust 
Molecular Canonicalization Algorithm"


Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Peter S. Shenkin
Well, if I'm recalling correctly, a highly symmetric structure like
buckminsterfullerene takes a long time to canonicalize.

I don't know what the formal definition of a planar graph is, but I would
guess it's not what chemists mean when they say a molecule is planar.

-P.

On Thu, Jun 15, 2023 at 2:51 PM S Joshua Swamidass 
wrote:

> Incidentally,
>
> I came across this O(log N) canonization algorithm for planar graphs:
> https://arxiv.org/pdf/0809.2319.pdf
>
> I wonder if this algorithm can be adapted for chemistry? Molecules are
> usually planar, but I believe they occasionally are "nearly" planar, by
> which I mean planar graphs plus a few edges that break the planarity.
>
> And what (generally speaking) is the algorithm used by rdkit? Do we know
> it's complexity?
>
>
>
>
> On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass 
> wrote:
>
>> Andrew,
>>
>> Thanks! According to wikipedia (and my recollections of algorithms
>> class)...
>> "The problem is not known to be solvable in polynomial time
>>  nor to be NP-complete
>> , and therefore may be in the
>> computational complexity class
>>  NP-intermediate
>> ."
>> https://en.wikipedia.org/wiki/Graph_isomorphism_problem
>>
>> Your reference though is really helpful. The key phrase seems to be
>> "bounded valence" which is certainly true of molecular graphs. Each atom
>> can only bound some fairly low number of other atoms, i.e. bounded valence.
>> That's probably the reason why we do have a polynomial time algorithm...
>>
>> Thank you!
>>
>> Joshua
>>
>> On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke 
>> wrote:
>>
>>> On Jun 15, 2023, at 18:20, S Joshua Swamidass 
>>> wrote:
>>> > It's well known that the graph-isomorphism problem is NP
>>>
>>> While P is contained in NP, I don't think that's the NP you mean.
>>>
>>> I suspect you may be thinking of subgraph isomorphism, which is NP-hard.
>>> Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished)
>>> claim is correct.
>>>
>>> Also, "Isomorphism of graphs of bounded valence can be tested in
>>> polynomial time" -
>>> https://www.sciencedirect.com/science/article/pii/002282900095 .
>>>
>>>
>>> > So here is my question. What are the cases that are very difficult to
>>> canonize a graph?
>>>
>>> As I recall, handling chirality and other non-local properties is
>>> difficult. I have not worked on this problem.
>>>
>>> Cheers,
>>>
>>> Andrew
>>> da...@dalkescientific.com
>>>
>>>
>>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread S Joshua Swamidass
Incidentally,

I came across this O(log N) canonization algorithm for planar graphs:
https://arxiv.org/pdf/0809.2319.pdf

I wonder if this algorithm can be adapted for chemistry? Molecules are
usually planar, but I believe they occasionally are "nearly" planar, by
which I mean planar graphs plus a few edges that break the planarity.

And what (generally speaking) is the algorithm used by rdkit? Do we know
it's complexity?




On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass 
wrote:

> Andrew,
>
> Thanks! According to wikipedia (and my recollections of algorithms
> class)...
> "The problem is not known to be solvable in polynomial time
>  nor to be NP-complete
> , and therefore may be in the
> computational complexity class
>  NP-intermediate
> ."
> https://en.wikipedia.org/wiki/Graph_isomorphism_problem
>
> Your reference though is really helpful. The key phrase seems to be
> "bounded valence" which is certainly true of molecular graphs. Each atom
> can only bound some fairly low number of other atoms, i.e. bounded valence.
> That's probably the reason why we do have a polynomial time algorithm...
>
> Thank you!
>
> Joshua
>
> On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke 
> wrote:
>
>> On Jun 15, 2023, at 18:20, S Joshua Swamidass 
>> wrote:
>> > It's well known that the graph-isomorphism problem is NP
>>
>> While P is contained in NP, I don't think that's the NP you mean.
>>
>> I suspect you may be thinking of subgraph isomorphism, which is NP-hard.
>> Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished)
>> claim is correct.
>>
>> Also, "Isomorphism of graphs of bounded valence can be tested in
>> polynomial time" -
>> https://www.sciencedirect.com/science/article/pii/002282900095 .
>>
>>
>> > So here is my question. What are the cases that are very difficult to
>> canonize a graph?
>>
>> As I recall, handling chirality and other non-local properties is
>> difficult. I have not worked on this problem.
>>
>> Cheers,
>>
>> Andrew
>> da...@dalkescientific.com
>>
>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread S Joshua Swamidass
Andrew,

Thanks! According to wikipedia (and my recollections of algorithms class)...
"The problem is not known to be solvable in polynomial time
 nor to be NP-complete
, and therefore may be in the
computational complexity class
 NP-intermediate
."
https://en.wikipedia.org/wiki/Graph_isomorphism_problem

Your reference though is really helpful. The key phrase seems to be
"bounded valence" which is certainly true of molecular graphs. Each atom
can only bound some fairly low number of other atoms, i.e. bounded valence.
That's probably the reason why we do have a polynomial time algorithm...

Thank you!

Joshua

On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke 
wrote:

> On Jun 15, 2023, at 18:20, S Joshua Swamidass  wrote:
> > It's well known that the graph-isomorphism problem is NP
>
> While P is contained in NP, I don't think that's the NP you mean.
>
> I suspect you may be thinking of subgraph isomorphism, which is NP-hard.
> Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished)
> claim is correct.
>
> Also, "Isomorphism of graphs of bounded valence can be tested in
> polynomial time" -
> https://www.sciencedirect.com/science/article/pii/002282900095 .
>
>
> > So here is my question. What are the cases that are very difficult to
> canonize a graph?
>
> As I recall, handling chirality and other non-local properties is
> difficult. I have not worked on this problem.
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Andrew Dalke
On Jun 15, 2023, at 18:20, S Joshua Swamidass  wrote:
> It's well known that the graph-isomorphism problem is NP

While P is contained in NP, I don't think that's the NP you mean.

I suspect you may be thinking of subgraph isomorphism, which is NP-hard. Graph 
isomorphism may be quasi-polynomial time, if Babai's (unpublished) claim is 
correct.

Also, "Isomorphism of graphs of bounded valence can be tested in polynomial 
time" - https://www.sciencedirect.com/science/article/pii/002282900095 .


> So here is my question. What are the cases that are very difficult to 
> canonize a graph?

As I recall, handling chirality and other non-local properties is difficult. I 
have not worked on this problem.

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss