Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
On Jul 23, 2021, at 01:01, Gustavo Seabra wrote: > I actually want the sulfone to be found, if it is there. My problem is that I > also want flexibility to change the ring atoms and still find the ring as a > match, while considering a match on the sulfone only if it really is there. > (e.g., CF3 should *not* match.) Does it make sense? Ahh, I see. No, there's no way to do that. The best I can suggest is to go back to the original Python implementation and change the code leading up to https://hg.sr.ht/~dalke/fmcs/browse/fmcs.py?rev=tip#L1929 so the initial seed is the sulfone instead of an (atom, bond, atom). Then use that to the the MCS with the sulfone, and if that fails, use RDKit's existing method. I point to my repository only because that's in Python and I know it better. If your C++ skills are better than mine, you might change the corresponding implementation in RDKit. Cheers, Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi, Thanks a lot for the reply! However, in this case, it looks like I would have to somehow label the isotope in every query molecule, right? For example: ``` template = Chem.MolFromSmarts('[c]1(-[2S](=[3O])(=[3O])(-C)):[c]:[c]:[c]:[c]:[c]:1') mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') compare = [template,mol1] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareIsotopes, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString ``` returns: '[0*]1:[0*]:[0*]:[0*]:[0*]:[0*]:1', that is, it only picks the ring but not the sulfone. I actually want the sulfone to be found, if it is there. My problem is that I also want flexibility to change the ring atoms and still find the ring as a match, while considering a match on the sulfone only if it really is there. (e.g., CF3 should *not* match.) Does it make sense? Thanks a lot! -- Gustavo Seabra. On Thu, Jul 22, 2021 at 4:52 PM Andrew Dalke wrote: > Hi Gustavo, > > > > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') > > Unless things have changed since I last looked at the algorithm, you can't > meaningfully pass a SMARTS-based query molecule into the MCS program, > outside of a few simple cases. > > It generates a SMARTS pattern based on the properties of the molecule. You > asked it to CompareElements, but those [a] terms all have an atomic number > of 0. > > >>> template = > Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1') > >>> [a.GetAtomicNum() for a in template.GetAtoms()] > [0, 16, 0, 8, 8, 0, 0, 0, 0, 0] > > That's why your CompareAny search returns the #0 terms, like: > > > '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' > > > I would appreciate some pointers on how it would be possible to find the > maximum common substructure of 2 molecules, where in the template structure > some atoms may be *any*, but some other atoms must be fixed. > > Perhaps with isotope labelling? > > That is, label the "any" atoms as isotope 1, and label your > -[S](=[O])(=[O])- as -[2S](=[3O])(=[3O])- > > Then use rdFMCS.AtomCompare.CompareIsotopes . > > If there's anything you don't want to match at all, give each atom a > unique isotope value. > > Best regards, > > Andrew > da...@dalkescientific.com > > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi Gustavo, > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') Unless things have changed since I last looked at the algorithm, you can't meaningfully pass a SMARTS-based query molecule into the MCS program, outside of a few simple cases. It generates a SMARTS pattern based on the properties of the molecule. You asked it to CompareElements, but those [a] terms all have an atomic number of 0. >>> template = Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1') >>> [a.GetAtomicNum() for a in template.GetAtoms()] [0, 16, 0, 8, 8, 0, 0, 0, 0, 0] That's why your CompareAny search returns the #0 terms, like: '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' > I would appreciate some pointers on how it would be possible to find the > maximum common substructure of 2 molecules, where in the template structure > some atoms may be *any*, but some other atoms must be fixed. Perhaps with isotope labelling? That is, label the "any" atoms as isotope 1, and label your -[S](=[O])(=[O])- as -[2S](=[3O])(=[3O])- Then use rdFMCS.AtomCompare.CompareIsotopes . If there's anything you don't want to match at all, give each atom a unique isotope value. Best regards, Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi all,, I would appreciate some pointers on how it would be possible to find the maximum common substructure of 2 molecules, where in the template structure some atoms may be *any*, but some other atoms must be fixed. Currently, I'm trying to use rdFMCS module. For example: from rdkit import Chem from rdkit.Chem import rdFMCS template = Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') # This should give a sulfone connected to an aromatic ring and # some other (any) element. Notice that the ring may have # any atoms (N,C,O), but for me it is important to have the SO2 group. mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') # This molecule has the pattern. # Now, if I try to find a substructure match, I use: compare = [template, mol1] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # gives: '[#16](=[#8])=[#8]' # Notice that the only match is the SO2, it does not match the ring. However, if I try that with another structure that has a CF3 in place of the SO2, I get: mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '' (empty string) # if I change to AtomCompare.CompareAny, now a CF3 will also match # in the SO2-X: mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareAny, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' But now theCF3 is counted in place of the SO2. The result I'd like to get here would be just the ring, as in the case: new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1') mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [new_template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring) Notice that if I use CompareElements, there seems to be no way to match the ring with either N or C. Does anyone have a suggestion on how I can specify flexibility (similar to AtomCompare.CompareAny) only for a portion of the molecule and still enforce specific atoms in another portion? Thank you so much! -- Gustavo Seabra. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss