Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

Brian Kelley Tue, 01 Nov 2016 02:41:11 -0700

A supplier is random access, so your call to supp[I] here is probably quite
expensive:


suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
l = len(suppl)
for j in range(ll):  # I have to make substructures in the first loop.
    for i in range(l):
        suppl[i].GetSubstructMatches(s[j])

I highly suggest using the python iteration as opposed to using an index
such as:

for mol in suppl:
  for pat in s:
      mol.GetSubstructMatches(pat)

I expect this will help quite a bit.  You may also consider using the
FilterCatalog which is designed to handle larger data sets and may help in
your case.

On Tue, Nov 1, 2016 at 4:22 AM, 杨弘宾 <yanyangh...@163.com> wrote:

> Hi,
>     Supposing I'd like to matching 100 substructures with 1000 compounds
> represented as smiles.
> What I did is:
>
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l = len(suppl)
> for j in range(ll):  # I have to make substructures in the first loop.
>     for i in range(l):
>         suppl[i].GetSubstructMatches(s[j])
> and found the performance is not good.
>
> Then I did a comparison and found that it was because the conformation of
> the compounds where not initiated.
> If I use MolFromSmiles，the performance will improve a lot.
> start = time.clock()
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l=len(suppl)
> print time.clock()-start   # >>> 0.0373735355168  indicating that the
> molecules were not initiated.
> for i in range(l):
>     suppl[i].GetSubstructMatches(sa)
>     suppl[i].GetSubstructMatches(sa2)
> print time.clock()-start   # >>> 11.1884715172
> start = time.clock()
> f = open('allmoleculenew.smi')
> for i in range(l):
>     mol = Chem.MolFromSmiles(f.next().split('\t')[0])
>     mol.GetSubstructMatches(sa)
>     mol.GetSubstructMatches(sa2)
> print time.clock()-start # >>> 5.44030582111
>
> The second method was double faster than the first, indicating that the
> "init" is more time consuming compared to matching.
> I think SmilesMolSupplier is a good API to load multiple compounds but it
> didnot parse the smiles immediately, which adds the time complexity to
> the further application. So is it possible to manually initiate the
> compounds?
>
> ------------------------------
> Hongbin Yang 杨弘宾
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy
> East China University of Science and Technology
>
> ------------------------------------------------------------
> ------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

Reply via email to