I'll make two more points ( thanks to Greg Landrum for pointing this out )
1). In your code each call to suppl[i] makes a new molecule, calling it twice
in a row is twice as slow. This explains your last result.
2) in my example, I was assuming that the queries were already in a python list
and not from a supplier. If they are being read from a supplier, you can
easily keep them all in memory with:
queries = list(query_supplier)
Note that for large files, this can take up a lot of memory.
Thanks for the clarification Greg.
----
Brian Kelley
> On Nov 1, 2016, at 4:22 AM, 杨弘宾 <yanyangh...@163.com> wrote:
>
> Hi,
> Supposing I'd like to matching 100 substructures with 1000 compounds
> represented as smiles.
> What I did is:
>
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l = len(suppl)
> for j in range(ll): # I have to make substructures in the first loop.
> for i in range(l):
> suppl[i].GetSubstructMatches(s[j])
> and found the performance is not good.
>
> Then I did a comparison and found that it was because the conformation of the
> compounds where not initiated.
> If I use MolFromSmiles,the performance will improve a lot.
> start = time.clock()
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l=len(suppl)
> print time.clock()-start # >>> 0.0373735355168 indicating that the
> molecules were not initiated.
> for i in range(l):
> suppl[i].GetSubstructMatches(sa)
> suppl[i].GetSubstructMatches(sa2)
> print time.clock()-start # >>> 11.1884715172
> start = time.clock()
> f = open('allmoleculenew.smi')
> for i in range(l):
> mol = Chem.MolFromSmiles(f.next().split('\t')[0])
> mol.GetSubstructMatches(sa)
> mol.GetSubstructMatches(sa2)
> print time.clock()-start # >>> 5.44030582111
>
> The second method was double faster than the first, indicating that the
> "init" is more time consuming compared to matching.
> I think SmilesMolSupplier is a good API to load multiple compounds but it
> didnot parse the smiles immediately, which adds the time complexity to the
> further application. So is it possible to manually initiate the compounds?
>
> Hongbin Yang 杨弘宾
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy
> East China University of Science and Technology
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss