Re: [Rdkit-discuss] HasSubstructMatch return False where it shouldn't
Hi Michał, Have you tried using AdjustQueryProperties(). I think Greg mentioned it in his presentation at UGM http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AdjustQueryProperties Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2016-11-01 19:20 GMT+01:00 Michał Nowotka: > Hi, > > I have this molfile (CHEMBL265667): > > > 11280714432D 1 1.0 0.0 0 > > 25 27 0 0 0999 V2000 > 3.8042 -1.60000. C 0 0 0 0 0 0 0 0 0 > 4.3167 -1.90000. N 0 0 3 0 0 0 0 0 0 > 3.8042 -1.0. N 0 0 0 0 0 0 0 0 0 > 4.8417 -1.60000. N 0 0 0 0 0 0 0 0 0 > 4.3167 -2.50000. C 0 0 0 0 0 0 0 0 0 > 4.3167 -3.69170. C 0 0 0 0 0 0 0 0 0 > 4.8417 -1.0. C 0 0 0 0 0 0 0 0 0 > 4.3167 -0.70000. C 0 0 0 0 0 0 0 0 0 > 3.7917 -3.39170. C 0 0 0 0 0 0 0 0 0 > 4.8375 -3.39170. C 0 0 0 0 0 0 0 0 0 > 3.8000 -2.79170. C 0 0 0 0 0 0 0 0 0 > 4.8375 -2.79170. C 0 0 0 0 0 0 0 0 0 > 4.3167 -4.29170. C 0 0 3 0 0 0 0 0 0 > 3.2875 -1.89170. O 0 0 0 0 0 0 0 0 0 > 4.8375 -4.59170. C 0 0 0 0 0 0 0 0 0 > 4.3167 -0.09170. O 0 0 0 0 0 0 0 0 0 > 4.8292 -5.19170. C 0 0 0 0 0 0 0 0 0 > 5.3500 -4.29170. C 0 0 0 0 0 0 0 0 0 > 5.8667 -5.19170. C 0 0 0 0 0 0 0 0 0 > 3.7917 -4.59170. O 0 0 0 0 0 0 0 0 0 > 5.8667 -4.59170. C 0 0 0 0 0 0 0 0 0 > 5.3542 -5.49170. C 0 0 0 0 0 0 0 0 0 > 6.3917 -5.49170. Cl 0 0 0 0 0 0 0 0 0 > 3.2750 -3.69170. C 0 0 0 0 0 0 0 0 0 > 5.3542 -3.69170. C 0 0 0 0 0 0 0 0 0 > 2 1 1 0 0 0 > 3 1 1 0 0 0 > 4 2 1 0 0 0 > 5 2 1 0 0 0 > 6 10 1 0 0 0 > 7 8 1 0 0 0 > 8 3 1 0 0 0 > 9 11 1 0 0 0 > 10 12 2 0 0 0 > 11 5 2 0 0 0 > 12 5 1 0 0 0 > 13 6 1 0 0 0 > 14 1 2 0 0 0 > 15 13 1 0 0 0 > 16 8 2 0 0 0 > 17 15 2 0 0 0 > 18 15 1 0 0 0 > 19 21 1 0 0 0 > 20 13 1 0 0 0 > 21 18 2 0 0 0 > 22 17 1 0 0 0 > 23 19 1 0 0 0 > 24 9 1 0 0 0 > 25 10 1 0 0 0 > 4 7 2 0 0 0 > 9 6 2 0 0 0 > 22 19 2 0 0 0 > M END > > and this smarts: [OH1]-C(-c1c1)c2c2 > > I'm using this code to find a substructure: > > mol = Chem.MolFromMolBlock(str(molstring), sanitize=False) > mol.UpdatePropertyCache(strict=False) > patt = Chem.MolFromSmarts(str(smarts)) > Chem.GetSSSR(patt) > Chem.GetSSSR(mol) > match = mol.HasSubstructMatch(patt) > > and the `match` is empty. > > But with indigo code: > > mol = indigoObj.loadMolecule(str(molstring)) > patt = indigoObj.loadSmarts(str(smarts)) > match = indigoObj.substructureMatcher(mol).match(patt) > > match is valid and I can render this to image: > > > > I'm I missing some flag or doing something wrong? > > -- > > Michal > > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] HasSubstructMatch return False where it shouldn't
Hi, I have this molfile (CHEMBL265667): 11280714432D 1 1.0 0.0 0 25 27 0 0 0999 V2000 3.8042 -1.60000. C 0 0 0 0 0 0 0 0 0 4.3167 -1.90000. N 0 0 3 0 0 0 0 0 0 3.8042 -1.0. N 0 0 0 0 0 0 0 0 0 4.8417 -1.60000. N 0 0 0 0 0 0 0 0 0 4.3167 -2.50000. C 0 0 0 0 0 0 0 0 0 4.3167 -3.69170. C 0 0 0 0 0 0 0 0 0 4.8417 -1.0. C 0 0 0 0 0 0 0 0 0 4.3167 -0.70000. C 0 0 0 0 0 0 0 0 0 3.7917 -3.39170. C 0 0 0 0 0 0 0 0 0 4.8375 -3.39170. C 0 0 0 0 0 0 0 0 0 3.8000 -2.79170. C 0 0 0 0 0 0 0 0 0 4.8375 -2.79170. C 0 0 0 0 0 0 0 0 0 4.3167 -4.29170. C 0 0 3 0 0 0 0 0 0 3.2875 -1.89170. O 0 0 0 0 0 0 0 0 0 4.8375 -4.59170. C 0 0 0 0 0 0 0 0 0 4.3167 -0.09170. O 0 0 0 0 0 0 0 0 0 4.8292 -5.19170. C 0 0 0 0 0 0 0 0 0 5.3500 -4.29170. C 0 0 0 0 0 0 0 0 0 5.8667 -5.19170. C 0 0 0 0 0 0 0 0 0 3.7917 -4.59170. O 0 0 0 0 0 0 0 0 0 5.8667 -4.59170. C 0 0 0 0 0 0 0 0 0 5.3542 -5.49170. C 0 0 0 0 0 0 0 0 0 6.3917 -5.49170. Cl 0 0 0 0 0 0 0 0 0 3.2750 -3.69170. C 0 0 0 0 0 0 0 0 0 5.3542 -3.69170. C 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 3 1 1 0 0 0 4 2 1 0 0 0 5 2 1 0 0 0 6 10 1 0 0 0 7 8 1 0 0 0 8 3 1 0 0 0 9 11 1 0 0 0 10 12 2 0 0 0 11 5 2 0 0 0 12 5 1 0 0 0 13 6 1 0 0 0 14 1 2 0 0 0 15 13 1 0 0 0 16 8 2 0 0 0 17 15 2 0 0 0 18 15 1 0 0 0 19 21 1 0 0 0 20 13 1 0 0 0 21 18 2 0 0 0 22 17 1 0 0 0 23 19 1 0 0 0 24 9 1 0 0 0 25 10 1 0 0 0 4 7 2 0 0 0 9 6 2 0 0 0 22 19 2 0 0 0 M END and this smarts: [OH1]-C(-c1c1)c2c2 I'm using this code to find a substructure: mol = Chem.MolFromMolBlock(str(molstring), sanitize=False) mol.UpdatePropertyCache(strict=False) patt = Chem.MolFromSmarts(str(smarts)) Chem.GetSSSR(patt) Chem.GetSSSR(mol) match = mol.HasSubstructMatch(patt) and the `match` is empty. But with indigo code: mol = indigoObj.loadMolecule(str(molstring)) patt = indigoObj.loadSmarts(str(smarts)) match = indigoObj.substructureMatcher(mol).match(patt) match is valid and I can render this to image: I'm I missing some flag or doing something wrong? -- Michal -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.
Hi, Brian, The first point you mentioned was acturally what I guessed and it is deprecated in my context, I think. Thanks for the second suggestion, I tried this and the performance improved: suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = len(suppl) # This line is crucialsuppl = list(suppl) And the types of suppl are repectively: , , So, though the second suppl (after len(suppl) ) is selectable, it was not a list indeed. It is amazing that the all molecules were instantiated after the `list` operator. : ) Hongbin Yang From: Brian KelleyDate: 2016-11-01 19:56To: 杨弘宾CC: rdkit-discussSubject: Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.I'll make two more points ( thanks to Greg Landrum for pointing this out ) 1). In your code each call to suppl[i] makes a new molecule, calling it twice in a row is twice as slow. This explains your last result. 2) in my example, I was assuming that the queries were already in a python list and not from a supplier. If they are being read from a supplier, you can easily keep them all in memory with: queries = list(query_supplier) Note that for large files, this can take up a lot of memory. Thanks for the clarification Greg. Brian Kelley On Nov 1, 2016, at 4:22 AM, 杨弘宾wrote: Hi, Supposing I'd like to matching 100 substructures with 1000 compounds represented as smiles.What I did is: suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = len(suppl)for j in range(ll): # I have to make substructures in the first loop. for i in range(l): suppl[i].GetSubstructMatches(s[j]) and found the performance is not good. Then I did a comparison and found that it was because the conformation of the compounds where not initiated.If I use MolFromSmiles,the performance will improve a lot.start = time.clock()suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') l=len(suppl) print time.clock()-start # >>> 0.0373735355168 indicating that the molecules were not initiated. for i in range(l): suppl[i].GetSubstructMatches(sa) suppl[i].GetSubstructMatches(sa2) print time.clock()-start # >>> 11.1884715172 start = time.clock() f = open('allmoleculenew.smi') for i in range(l): mol = Chem.MolFromSmiles(f.next().split('\t')[0]) mol.GetSubstructMatches(sa) mol.GetSubstructMatches(sa2)print time.clock()-start # >>> 5.44030582111 The second method was double faster than the first, indicating that the "init" is more time consuming compared to matching.I think SmilesMolSupplier is a good API to load multiple compounds but it didnot parse the smiles immediately, which adds the time complexity to the further application. So is it possible to manually initiate the compounds? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.
I'll make two more points ( thanks to Greg Landrum for pointing this out ) 1). In your code each call to suppl[i] makes a new molecule, calling it twice in a row is twice as slow. This explains your last result. 2) in my example, I was assuming that the queries were already in a python list and not from a supplier. If they are being read from a supplier, you can easily keep them all in memory with: queries = list(query_supplier) Note that for large files, this can take up a lot of memory. Thanks for the clarification Greg. Brian Kelley > On Nov 1, 2016, at 4:22 AM, 杨弘宾wrote: > > Hi, > Supposing I'd like to matching 100 substructures with 1000 compounds > represented as smiles. > What I did is: > > suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') > l = len(suppl) > for j in range(ll): # I have to make substructures in the first loop. > for i in range(l): > suppl[i].GetSubstructMatches(s[j]) > and found the performance is not good. > > Then I did a comparison and found that it was because the conformation of the > compounds where not initiated. > If I use MolFromSmiles,the performance will improve a lot. > start = time.clock() > suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') > l=len(suppl) > print time.clock()-start # >>> 0.0373735355168 indicating that the > molecules were not initiated. > for i in range(l): > suppl[i].GetSubstructMatches(sa) > suppl[i].GetSubstructMatches(sa2) > print time.clock()-start # >>> 11.1884715172 > start = time.clock() > f = open('allmoleculenew.smi') > for i in range(l): > mol = Chem.MolFromSmiles(f.next().split('\t')[0]) > mol.GetSubstructMatches(sa) > mol.GetSubstructMatches(sa2) > print time.clock()-start # >>> 5.44030582111 > > The second method was double faster than the first, indicating that the > "init" is more time consuming compared to matching. > I think SmilesMolSupplier is a good API to load multiple compounds but it > didnot parse the smiles immediately, which adds the time complexity to the > further application. So is it possible to manually initiate the compounds? > > Hongbin Yang 杨弘宾 > Research: Toxicophore and Chemoinformatics > Pharmaceutical Science, School of Pharmacy > East China University of Science and Technology > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Substructure by atom indices
Hi, There is PathToSubmol() although it takes the list of bonds. If you have atom indices: bonds = [] > atommap = {} for i,j in combinations(atom_path, 2): > b = ParentMol.GetBondBetweenAtoms(i,j) > if b: >bonds.append(b.GetIdx()) NewMol = Chem.PathToSubmol(ParentMol, bonds, atomMap=atommap) atommap is a dictionary populated with atom indicies mapping from ParentMol to the new one. Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2016-11-01 11:00 GMT+01:00 Juuso Lehtivarjo: > Hi All, > > Is there a python function (or any simple way whatsoever) to create a > substructure mol object from another one based on the given atom > indices? In C++ this could apparently be done with > getMolFragsWithQuery, but that does not seem to be much used in python > wrappers... > > Best, >Juuso > > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Substructure by atom indices
Hi All, Is there a python function (or any simple way whatsoever) to create a substructure mol object from another one based on the given atom indices? In C++ this could apparently be done with getMolFragsWithQuery, but that does not seem to be much used in python wrappers... Best, Juuso -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.
A supplier is random access, so your call to supp[I] here is probably quite expensive: suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') l = len(suppl) for j in range(ll): # I have to make substructures in the first loop. for i in range(l): suppl[i].GetSubstructMatches(s[j]) I highly suggest using the python iteration as opposed to using an index such as: for mol in suppl: for pat in s: mol.GetSubstructMatches(pat) I expect this will help quite a bit. You may also consider using the FilterCatalog which is designed to handle larger data sets and may help in your case. On Tue, Nov 1, 2016 at 4:22 AM, 杨弘宾wrote: > Hi, > Supposing I'd like to matching 100 substructures with 1000 compounds > represented as smiles. > What I did is: > > suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') > l = len(suppl) > for j in range(ll): # I have to make substructures in the first loop. > for i in range(l): > suppl[i].GetSubstructMatches(s[j]) > and found the performance is not good. > > Then I did a comparison and found that it was because the conformation of > the compounds where not initiated. > If I use MolFromSmiles,the performance will improve a lot. > start = time.clock() > suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') > l=len(suppl) > print time.clock()-start # >>> 0.0373735355168 indicating that the > molecules were not initiated. > for i in range(l): > suppl[i].GetSubstructMatches(sa) > suppl[i].GetSubstructMatches(sa2) > print time.clock()-start # >>> 11.1884715172 > start = time.clock() > f = open('allmoleculenew.smi') > for i in range(l): > mol = Chem.MolFromSmiles(f.next().split('\t')[0]) > mol.GetSubstructMatches(sa) > mol.GetSubstructMatches(sa2) > print time.clock()-start # >>> 5.44030582111 > > The second method was double faster than the first, indicating that the > "init" is more time consuming compared to matching. > I think SmilesMolSupplier is a good API to load multiple compounds but it > didnot parse the smiles immediately, which adds the time complexity to > the further application. So is it possible to manually initiate the > compounds? > > -- > Hongbin Yang 杨弘宾 > Research: Toxicophore and Chemoinformatics > Pharmaceutical Science, School of Pharmacy > East China University of Science and Technology > > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Problem adding hydrogens to peptides
Dear All, Enthused by all the great talks at the UGM, for the last couple of days I have been getting more hands-on with RDKit than I have in quite a while! I was keen to work with some peptides/proteins in 3D, but am having some problems when adding hydrogens... I have uploaded a GIST to demonstrate the issue (apologies - the py3Dmol js doesn't render in the nbviewer, but this doesn't affect understanding): https://gist.github.com/jepdavidson/f5220187c18be0fc9e119f9da2e7d955 The main problem is that added hydrogens don't automatically get assigned monomer info from the monomer they are being added to, but there are other issues as well (the hydrogens are marked 'HETATM', the occupancy for the ATOM blocks are set to "-nan", and the CONECT block doesn't list the added Hs). Propagating the monomer info from the amino acids to the added Hs isn't too difficult (can call atom.GetNeighbors() and take the info from the neighbouring atom) - but there are also some preferred (or required?) naming and numbering conventions to adhere to ("H" for the backbone NH, "HA" for the hydrogen on the alpha carbon, etc). Perhaps I am missing something here (a secret 'flavour' option? :)) - but if not, it would be interesting to hear what behaviour others would expect when adding explicit hydrogens (I think the same issues will relate to any sequence where monomer information is present). Kind regards James __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies 100 Berkshire Place Wharfedale Road Winnersh, Berkshire RG41 5RD, England Tel: +44 (0)118 938 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page.. __-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.
Hi,? ??Supposing I'd like to matching 100 substructures with 1000 compounds represented as smiles.What I did is: suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = len(suppl)for j in range(ll): ?# I have to make substructures in the first loop.? ??for i in range(l): ? ??? ??suppl[i].GetSubstructMatches(s[j])?and found the performance is not good. Then I did a comparison and found that it was because the conformation of the compounds where not initiated.If I use MolFromSmiles,the performance will improve a lot.start = time.clock()suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') l=len(suppl)?print time.clock()-start ? # >>>?0.0373735355168 ?indicating that the molecules were not initiated. for i in range(l): ? ??suppl[i].GetSubstructMatches(sa) ? ??suppl[i].GetSubstructMatches(sa2) print time.clock()-start ? # >>>?11.1884715172 start = time.clock() f = open('allmoleculenew.smi') for i in range(l): ? ??mol = Chem.MolFromSmiles(f.next().split('\t')[0]) ? ??mol.GetSubstructMatches(sa) ? ??mol.GetSubstructMatches(sa2)print time.clock()-start # >>>?5.44030582111 The second method was double faster than the first, indicating that the "init" is more time consuming compared to matching.I think?SmilesMolSupplier is a good API to load multiple compounds but it didnot parse the smiles immediately, which adds the?time complexity to the further application. So is it possible to manually initiate the compounds? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology? -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss