Hi Paolo, sorry, I made a typo (makeBondGeneric instead of makeBondsGeneric) that's why the bonds weren't UNSPECIFIED. The following examples seem to work fine now for these two SMILES, the first structure will be found in the second one.
C12=CC=CN1NCCC2 and C12C=CC=C(C=C3)C=1N3NCC2 However, there is another example where it still doesn't work with this code. See my code below. The two SMILES N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3 and C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3 actually describe the identical structure, but were drawn in a different way in ChemDraw. As a consequence the SMILES are different which shouldn't be a problem. But if I put these SMILES into the code below the first one won't match the second one and the other way around as well. I must be doing something horribly wrong. Do I have to canonicalize the SMILES first? Isn't there a good tutorial on substructure search with RDKit and all its options and frequently asked questions and tons of examples? best, Theo. ### start of code from rdkit import Chem smiles_strings = ''' N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3 C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3 ''' smiles_list = smiles_strings.splitlines()[1:] print(smiles_list) params = Chem.SmilesParserParams() params.sanitize=False mols = [Chem.MolFromSmiles(x,params) for x in smiles_list] pattern = Chem.MolFromSmiles(smiles_list[0],params) query_params = Chem.AdjustQueryParameters() query_params.makeBondsGeneric = True query_params.aromatizeIfPossible = False query_params.adjustDegree = False query_params.adjustHeavyDegree = False pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params) matches = [idx for idx,m in enumerate(mols) if m.HasSubstructMatch(pattern_generic_bonds)] print("{} of {}: {}".format(len(matches),len(smiles_list),matches)) ### end of code Am 19.05.2020 um 18:30 schrieb Paolo Tosco: > Hi Theo, > > I don't think the RDKit version should make a difference; did you notice that > rdmolops.AdjustQueryProperties() does not modify the molecule in place, but > rather returns a modified copy? > > pattern_generic_bonds = Chem.AdjustQueryProperties(pattern, query_params) > > That might be the reason. Also, only pattern_generic_bonds will have > UNSPECIFIED bonds, the mols will still have SINGLE and DOUBLE bonds. > > Feel free to contact me off-list if you need help with the above. > > Cheers, > p. > > On 19/05/2020 17:01, theozh wrote: >> Hi Paolo, >> >> thank you very much for your detailed answer. >> I tried to reproduce your last suggestion (but I don't have Jupyter >> Notebook). >> However, my bonds are still SINGLE and DOUBLE instead of UNSPECIFIED. >> Does this maybe depend on the RDKit Version, I have 2019.03... ? >> >> Maybe, I should update and need to investigate further. >> Theo. >> >> >> Am 19.05.2020 um 16:44 schrieb Paolo Tosco: >>> Hi Theo, >>> >>> the lack of match is due to different aromaticity flags on atoms and bonds >>> in the larger molecule. >>> >>> This gist provides some explanation and a possible solution: >>> >>> https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788 >>> >>> Cheers, >>> p. >>> >>> On 19/05/2020 14:13, theozh wrote: >>>> Dear RDKit-users, >>>> >>>> I would like to do a very simple substructure search. >>>> The chapter 3.5 "Substructure Searching" in RDKit Documentation >>>> (2019.09.1) is pretty short and doesn't point to a solution. So far, I've >>>> learned that you can create your search pattern via Chem.MolFromSmiles() >>>> or Chem.MolFromSmarts(). >>>> >>>> In the below copy&paste minimal example, I want to use the first SMILES in >>>> the list as search pattern. I expect 2 matches but I get either 1 or 0 >>>> matches. So, I'm doing something wrong. What am I missing? >>>> Is it about implicit/explicit aromatic and aliphatic bonds or some >>>> explicit/implicit hydrogen? >>>> How to find the first structure in both SMILES? >>>> >>>> thank you for any hints, >>>> Theo. >>>> >>>> ### simple substructure search (but doesn't find what is expected) >>>> from rdkit import Chem >>>> >>>> smiles_strings = ''' >>>> C12=CC=CN1NCCC2 >>>> C12=CC=CC(C=C3)=C1N3NCC2 >>>> ''' >>>> smiles_list = smiles_strings.splitlines()[1:] >>>> print(smiles_list) >>>> >>>> pattern = Chem.MolFromSmiles(smiles_list[0]) # MolFromSmiles >>>> matches = [x for x in smiles_list if >>>> Chem.MolFromSmiles(x).HasSubstructMatch(pattern)] >>>> print(len(matches)) # result: 1, why not 2? >>>> >>>> pattern = Chem.MolFromSmarts(smiles_list[0]) # MolFromSmarts >>>> matches = [x for x in smiles_list if >>>> Chem.MolFromSmiles(x).HasSubstructMatch(pattern)] >>>> print(len(matches)) # result: 0, why not 2? >>>> ### end of code >>>> >>>> >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdkit-discuss@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss