Thanks for the feedback and code example!

I understand that it works to make a third query mol using MCS that matches 
both the original mols to then match with.  However, this seems like overkill 
(overly expensive) for this particular problem – as I understand it MCS can be 
very expensive depending on the compounds you are comparing.  Would it not work 
to simply override the atom.Match function with one that will always match 
dummies no matter what the other atom is?  I am not planning to compare SMARTSy 
queries with my matching with any complexity beyond simply dummy atoms.  In 
fact, as I understand it, my example compounds are not made up of any query 
atoms when they are read into rdkit – the dummies are just made into queries 
after the read by the QueryParameters code.  I am definitely not interested in 
doing generic query to query matching.

- Kovas


From: Christos Kannas <chriskan...@gmail.com>
Date: Thursday, August 23, 2018 at 7:53 AM
To: Kovas Palunas <kovas.palu...@arzeda.com>
Cc: RDKit <rdkit-discuss@lists.sourceforge.net>, Paolo Tosco 
<paolo.tosco.m...@gmail.com>
Subject: Re: [Rdkit-discuss] Matching Generalized Compounds

Hi Kovas,

You have two fuzzy compounds that you try to match them, because our intuition 
says that any atom notation [*:1] from m1 should match the Fluorine [F:11] in 
m2 and any atom [*:14] in m2 should match Carbon [CH3:4] in m1.
The issue here is that you create two query compounds from m1 and m2 which will 
match their own specific substructures. Query to query matching is not trivial.

In order to do what you want you need a query compound that combines their 
characteristic, which is what Paolo showed.
Paolo with MCS and modifying atom properties created that query compound 
'[*:1]-[CH2:2]-[C:3](-[*:4])=[CH2:5]' or 
'[*:1]-[CH2X4:2]-[CX3:3](-[*:4])=[CH2X3:5]'
Also bare in mind that Paolo's approach changed the starting compounds, as now 
they resemble the generic query compound that combines their fuzzy atoms.

https://gist.github.com/CKannas/ac1a4791dec909552d7c8899cfaff030

Best,

Christos

Christos Kannas

Chem[o]informatics Researcher & Software Developer

[Image removed by sender. View Christos Kannas's profile on 
LinkedIn]<http://cy.linkedin.com/in/christoskannas>


On Thu, 23 Aug 2018 at 12:36, Paolo Tosco 
<paolo.tosco.m...@gmail.com<mailto:paolo.tosco.m...@gmail.com>> wrote:

Dear Kovas,

It looks like GetSubstructMatch() only finds a match if the dummy atom is in 
the query, not if it is in the molecule they you are matching the query against.

This notebook present a possible solution off the top of my head:

https://gist.github.com/ptosco/a35ac28a14103b47096f6d6af1aec831

which does not involve changes to the C++ layer, even though it is 
computationally more expensive and will fail with disconnected fragments as it 
uses FindMCS(). There may be better solutions - this is what I came out with 
yesterday night in the little time I had available.

Cheers,
P.

On 08/22/18 19:34, Kovas Palunas wrote:
Hi All,

I’m interested in having GetSubstructMatches return non-“null” results in the 
following example.  The results should lead to a match where atom 1 maps to 
atom 11, 2 to 12, etc.

m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')

### do something here so that the mols will match ###
qp = Chem.AdjustQueryParameters()
qp.makeDummiesQueries = True
m1 = Chem.AdjustQueryProperties(m1, qp)
m2 = Chem.AdjustQueryProperties(m2, qp)

# I’d like both of the following to return results
m1.GetSubstructMatches(m2)
m2.GetSubstructMatches(m1)

My understanding of why these mols currently do not match is as follows:  
because only the dummy atoms are made queries (based on my query parameter 
adjustment), when one mol is matched to another dummy 1 may match to F:11, but 
dummy 14 will then not match to methyl:14.  This is because (as I understand), 
normal atoms can only be matched by queries, and cannot match them themselves.

Potential ideas to make this work as I’d like:

  1.  Override atom.Match in the python code – not sure that this would work 
since the C++ version of this function is what would be called during 
GetSubstructMatches
  2.  Override atom.Match in the C++ code – not quite sure how to do this, or 
what side affects it might have.  Ideally the changes I make would only affect 
this example (and other similar ones)
  3.  Make all atoms in both molecules QueryAtoms, but otherwise leave them 
unchanged.  I’m not quite sure how to do this!

Does anyone have any ideas for what the best approach here would be, or knows 
if there is already built in functionality for something like this?  I’d prefer 
to not use SMARTS to construct my molecules if possible, since I don’t really 
think of them as queries, just as other molecules in the system that happen to 
not be fully specified.

- Kovas




------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, Slashdot.org! http://sdm.link/slashdot



_______________________________________________

Rdkit-discuss mailing list

Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
http://sdm.link/slashdot_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to