Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

Brian Kelley Tue, 01 Nov 2016 04:58:35 -0700

I'll make two more points ( thanks to Greg Landrum for pointing this out )

1). In your code each call to suppl[i] makes a new molecule, calling it twice 
in a row is twice as slow.  This explains your last result.


2) in my example, I was assuming that the queries were already in a python list 
and not from a supplier.  If they are being read from a supplier, you can 
easily keep them all in memory with:

queries = list(query_supplier)

Note that for large files, this can take up a lot of memory.

Thanks for the clarification Greg.
----
Brian Kelley

> On Nov 1, 2016, at 4:22 AM, 杨弘宾 <yanyangh...@163.com> wrote:
> 
> Hi,
>     Supposing I'd like to matching 100 substructures with 1000 compounds 
> represented as smiles.
> What I did is:
> 
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l = len(suppl)
> for j in range(ll):  # I have to make substructures in the first loop.
>     for i in range(l): 
>         suppl[i].GetSubstructMatches(s[j]) 
> and found the performance is not good.
> 
> Then I did a comparison and found that it was because the conformation of the 
> compounds where not initiated.
> If I use MolFromSmiles，the performance will improve a lot.
> start = time.clock()
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') 
> l=len(suppl) 
> print time.clock()-start   # >>> 0.0373735355168  indicating that the 
> molecules were not initiated.
> for i in range(l): 
>     suppl[i].GetSubstructMatches(sa) 
>     suppl[i].GetSubstructMatches(sa2) 
> print time.clock()-start   # >>> 11.1884715172
> start = time.clock() 
> f = open('allmoleculenew.smi') 
> for i in range(l): 
>     mol = Chem.MolFromSmiles(f.next().split('\t')[0]) 
>     mol.GetSubstructMatches(sa) 
>     mol.GetSubstructMatches(sa2)
> print time.clock()-start # >>> 5.44030582111
> 
> The second method was double faster than the first, indicating that the 
> "init" is more time consuming compared to matching.
> I think SmilesMolSupplier is a good API to load multiple compounds but it 
> didnot parse the smiles immediately, which adds the time complexity to the 
> further application. So is it possible to manually initiate the compounds?
> 
> Hongbin Yang 杨弘宾 
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy 
> East China University of Science and Technology 
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

Reply via email to