Thank you for answering, Gilleain.

Hello, The term 'subgraph isomorphism' is (I think) usually used to mean
> finding a subgraph of a graph G that is isomorphic to another graph H. What
> you describe sounds like the more difficult problem of finding all common
> subgraphs of G and H. So, I am not so sure that the javadocs are unclear in
> this case.


Hm, I couldn't really find an authoritative definition - and I am certain a
number of people could argue about this, but from the literature I've seen,
I always thought that two graphs may have a large number of isomorphisms,
and it is the largest isomorphism that is selected by procedures like
maximal common substructure searching, e.g. via the Ullman algorithm. When
one starts with an atom/bond pair in G1 and G2 and attempts to iteratively
extend the initial matched fragment, it would seem a trivial task to output
at every step of the iterative extension the subgraph, or a part of it, that
does not yet violate edge/node matching criteria set forth in the subgraph
isomorphism searching algorithm. As far as I understood, these subgraphs are
collected anyway for the purposes of comparison, unless superseded by an
extension, and only the result of the longest extension-matching procedure
is selected as the MCS. So, I erred and believed that the
getSubgraphAtomsMaps would give me precisely that, a collection of all the
fragments that match and their variations.

There is the possibility that H may be found multiple times in G (a polymer,
> for example, will have multiple copies of the monomer in it), which is
> sumably why the return value of these methods are lists.


Again, I thought that was for the MCS, but please, feel free to let me know
if I was wrong. I just wanted to make certain that CDK does not currently do
this and that I am not re-inventing the wheel here. And again, I suppose I
just misinterpreted the javadoc, sorry.

 As for how this could be done - well the simplest (and most expensive) way
> would be to enumerate every (connected?) subgraph of H, testing each one to
> see if it is also a subgraph of G. Gilleain Torrance


That's one method that I have tried, however, in many cases, the cost of
enumerating all the possible connected subgraphs of H is astronomical. For
instance, if we cut five bonds of a molecule that contains 40 bonds, we will
have about 7.5e10 fragments to enumerate. If we engage in growing fragments
with all possible branches, that also often leads to combinatorial
explosions. I have created an 'approximate solution', which is more
efficient than what I am outlining here, but it's still an approximation.
So, for all practical purposes, that is not a feasible approach. Well, it
would have been easier to do it as I mentioned, that all fragments beyond a
certain size would be reported in the generic collection at every iteration
of the subgraph isomorphism detection tool. This has already been done and
published numerous times, without the release of the source code. Oh well,
if no one currently does this, I guess I'll go ahead and reinvent the wheel
yet again.

Any other opinions, anyone? I know you are there!

Regards,

Leonid Chepelev
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to