Re: [Rdkit-discuss] HasSubstructMatch return False where it shouldn't

2016-11-01 Thread Maciek Wójcikowski
Hi Michał,

Have you tried using AdjustQueryProperties(). I think Greg mentioned it in
his presentation at UGM

http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AdjustQueryProperties


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-11-01 19:20 GMT+01:00 Michał Nowotka :

> Hi,
>
> I have this molfile (CHEMBL265667):
>
>
>   11280714432D 1   1.0 0.0 0
>
>  25 27  0 0  0999 V2000
> 3.8042   -1.60000. C   0  0  0  0  0  0   0  0  0
> 4.3167   -1.90000. N   0  0  3  0  0  0   0  0  0
> 3.8042   -1.0. N   0  0  0  0  0  0   0  0  0
> 4.8417   -1.60000. N   0  0  0  0  0  0   0  0  0
> 4.3167   -2.50000. C   0  0  0  0  0  0   0  0  0
> 4.3167   -3.69170. C   0  0  0  0  0  0   0  0  0
> 4.8417   -1.0. C   0  0  0  0  0  0   0  0  0
> 4.3167   -0.70000. C   0  0  0  0  0  0   0  0  0
> 3.7917   -3.39170. C   0  0  0  0  0  0   0  0  0
> 4.8375   -3.39170. C   0  0  0  0  0  0   0  0  0
> 3.8000   -2.79170. C   0  0  0  0  0  0   0  0  0
> 4.8375   -2.79170. C   0  0  0  0  0  0   0  0  0
> 4.3167   -4.29170. C   0  0  3  0  0  0   0  0  0
> 3.2875   -1.89170. O   0  0  0  0  0  0   0  0  0
> 4.8375   -4.59170. C   0  0  0  0  0  0   0  0  0
> 4.3167   -0.09170. O   0  0  0  0  0  0   0  0  0
> 4.8292   -5.19170. C   0  0  0  0  0  0   0  0  0
> 5.3500   -4.29170. C   0  0  0  0  0  0   0  0  0
> 5.8667   -5.19170. C   0  0  0  0  0  0   0  0  0
> 3.7917   -4.59170. O   0  0  0  0  0  0   0  0  0
> 5.8667   -4.59170. C   0  0  0  0  0  0   0  0  0
> 5.3542   -5.49170. C   0  0  0  0  0  0   0  0  0
> 6.3917   -5.49170. Cl  0  0  0  0  0  0   0  0  0
> 3.2750   -3.69170. C   0  0  0  0  0  0   0  0  0
> 5.3542   -3.69170. C   0  0  0  0  0  0   0  0  0
>   2  1  1  0 0  0
>   3  1  1  0 0  0
>   4  2  1  0 0  0
>   5  2  1  0 0  0
>   6 10  1  0 0  0
>   7  8  1  0 0  0
>   8  3  1  0 0  0
>   9 11  1  0 0  0
>  10 12  2  0 0  0
>  11  5  2  0 0  0
>  12  5  1  0 0  0
>  13  6  1  0 0  0
>  14  1  2  0 0  0
>  15 13  1  0 0  0
>  16  8  2  0 0  0
>  17 15  2  0 0  0
>  18 15  1  0 0  0
>  19 21  1  0 0  0
>  20 13  1  0 0  0
>  21 18  2  0 0  0
>  22 17  1  0 0  0
>  23 19  1  0 0  0
>  24  9  1  0 0  0
>  25 10  1  0 0  0
>   4  7  2  0 0  0
>   9  6  2  0 0  0
>  22 19  2  0 0  0
> M  END
>
> and this smarts: [OH1]-C(-c1c1)c2c2
>
> I'm using this code to find a substructure:
>
> mol = Chem.MolFromMolBlock(str(molstring), sanitize=False)
> mol.UpdatePropertyCache(strict=False)
> patt = Chem.MolFromSmarts(str(smarts))
> Chem.GetSSSR(patt)
> Chem.GetSSSR(mol)
> match = mol.HasSubstructMatch(patt)
>
> and the `match` is empty.
>
> But with indigo code:
>
> mol = indigoObj.loadMolecule(str(molstring))
> patt = indigoObj.loadSmarts(str(smarts))
> match = indigoObj.substructureMatcher(mol).match(patt)
>
> match is valid and I can render this to image:
>
>
> ​
> ​I'm I missing some flag or doing something wrong?
>
> --
>
> Michal
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] HasSubstructMatch return False where it shouldn't

2016-11-01 Thread Michał Nowotka
Hi,

I have this molfile (CHEMBL265667):


  11280714432D 1   1.0 0.0 0

 25 27  0 0  0999 V2000
3.8042   -1.60000. C   0  0  0  0  0  0   0  0  0
4.3167   -1.90000. N   0  0  3  0  0  0   0  0  0
3.8042   -1.0. N   0  0  0  0  0  0   0  0  0
4.8417   -1.60000. N   0  0  0  0  0  0   0  0  0
4.3167   -2.50000. C   0  0  0  0  0  0   0  0  0
4.3167   -3.69170. C   0  0  0  0  0  0   0  0  0
4.8417   -1.0. C   0  0  0  0  0  0   0  0  0
4.3167   -0.70000. C   0  0  0  0  0  0   0  0  0
3.7917   -3.39170. C   0  0  0  0  0  0   0  0  0
4.8375   -3.39170. C   0  0  0  0  0  0   0  0  0
3.8000   -2.79170. C   0  0  0  0  0  0   0  0  0
4.8375   -2.79170. C   0  0  0  0  0  0   0  0  0
4.3167   -4.29170. C   0  0  3  0  0  0   0  0  0
3.2875   -1.89170. O   0  0  0  0  0  0   0  0  0
4.8375   -4.59170. C   0  0  0  0  0  0   0  0  0
4.3167   -0.09170. O   0  0  0  0  0  0   0  0  0
4.8292   -5.19170. C   0  0  0  0  0  0   0  0  0
5.3500   -4.29170. C   0  0  0  0  0  0   0  0  0
5.8667   -5.19170. C   0  0  0  0  0  0   0  0  0
3.7917   -4.59170. O   0  0  0  0  0  0   0  0  0
5.8667   -4.59170. C   0  0  0  0  0  0   0  0  0
5.3542   -5.49170. C   0  0  0  0  0  0   0  0  0
6.3917   -5.49170. Cl  0  0  0  0  0  0   0  0  0
3.2750   -3.69170. C   0  0  0  0  0  0   0  0  0
5.3542   -3.69170. C   0  0  0  0  0  0   0  0  0
  2  1  1  0 0  0
  3  1  1  0 0  0
  4  2  1  0 0  0
  5  2  1  0 0  0
  6 10  1  0 0  0
  7  8  1  0 0  0
  8  3  1  0 0  0
  9 11  1  0 0  0
 10 12  2  0 0  0
 11  5  2  0 0  0
 12  5  1  0 0  0
 13  6  1  0 0  0
 14  1  2  0 0  0
 15 13  1  0 0  0
 16  8  2  0 0  0
 17 15  2  0 0  0
 18 15  1  0 0  0
 19 21  1  0 0  0
 20 13  1  0 0  0
 21 18  2  0 0  0
 22 17  1  0 0  0
 23 19  1  0 0  0
 24  9  1  0 0  0
 25 10  1  0 0  0
  4  7  2  0 0  0
  9  6  2  0 0  0
 22 19  2  0 0  0
M  END

and this smarts: [OH1]-C(-c1c1)c2c2

I'm using this code to find a substructure:

mol = Chem.MolFromMolBlock(str(molstring), sanitize=False)
mol.UpdatePropertyCache(strict=False)
patt = Chem.MolFromSmarts(str(smarts))
Chem.GetSSSR(patt)
Chem.GetSSSR(mol)
match = mol.HasSubstructMatch(patt)

and the `match` is empty.

But with indigo code:

mol = indigoObj.loadMolecule(str(molstring))
patt = indigoObj.loadSmarts(str(smarts))
match = indigoObj.substructureMatcher(mol).match(patt)

match is valid and I can render this to image:


​
​I'm I missing some flag or doing something wrong?

--

Michal
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

2016-11-01 Thread 杨弘宾






Hi, Brian,    The first point you mentioned was acturally what I guessed and it 
is deprecated in my context, I think.    Thanks for the second suggestion, I 
tried this and the performance improved:
suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = 
len(suppl)  # This line is crucialsuppl = list(suppl)
And the types of suppl are repectively: , ,
So, though the second suppl (after len(suppl) ) is selectable, it was not a 
list indeed. It is amazing that the all molecules were instantiated after the 
`list` operator.
: )

Hongbin Yang

 From: Brian KelleyDate: 2016-11-01 19:56To: 杨弘宾CC: rdkit-discussSubject: Re: 
[Rdkit-discuss] Is there a way to init the conformations of smiles supplier to 
improve the performance for substructure matching.I'll make two more points ( 
thanks to Greg Landrum for pointing this out )
1). In your code each call to suppl[i] makes a new molecule, calling it twice 
in a row is twice as slow.  This explains your last result.
2) in my example, I was assuming that the queries were already in a python list 
and not from a supplier.  If they are being read from a supplier, you can 
easily keep them all in memory with:
queries = list(query_supplier)

Note that for large files, this can take up a lot of memory.
Thanks for the clarification Greg.
Brian Kelley
On Nov 1, 2016, at 4:22 AM, 杨弘宾  wrote:


Hi,    Supposing I'd like to matching 100 substructures with 1000 compounds 
represented as smiles.What I did is:
suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = 
len(suppl)for j in range(ll):  # I have to make substructures in the first 
loop.    for i in range(l):

        suppl[i].GetSubstructMatches(s[j]) and found the performance is not 
good.
Then I did a comparison and found that it was because the conformation of the 
compounds where not initiated.If I use MolFromSmiles,the performance will 
improve a lot.start = time.clock()suppl = 
AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')

l=len(suppl) print time.clock()-start   # >>> 0.0373735355168  indicating that 
the molecules were not initiated.
for i in range(l):

    suppl[i].GetSubstructMatches(sa)

    suppl[i].GetSubstructMatches(sa2)

print time.clock()-start   # >>> 11.1884715172
start = time.clock()

f = open('allmoleculenew.smi')

for i in range(l):

    mol = Chem.MolFromSmiles(f.next().split('\t')[0])

    mol.GetSubstructMatches(sa)

    mol.GetSubstructMatches(sa2)print time.clock()-start # >>> 5.44030582111
The second method was double faster than the first, indicating that the "init" 
is more time consuming compared to matching.I think SmilesMolSupplier is a good 
API to load multiple compounds but it didnot parse the smiles immediately, 
which adds the time complexity to the further application. So is it possible to 
manually initiate the compounds?


Hongbin Yang 杨弘宾

Research: Toxicophore and Chemoinformatics
Pharmaceutical Science, School of Pharmacy

East China University of Science and Technology 

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. 
http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

2016-11-01 Thread Brian Kelley
I'll make two more points ( thanks to Greg Landrum for pointing this out )

1). In your code each call to suppl[i] makes a new molecule, calling it twice 
in a row is twice as slow.  This explains your last result.

2) in my example, I was assuming that the queries were already in a python list 
and not from a supplier.  If they are being read from a supplier, you can 
easily keep them all in memory with:

queries = list(query_supplier)

Note that for large files, this can take up a lot of memory.

Thanks for the clarification Greg.

Brian Kelley

> On Nov 1, 2016, at 4:22 AM, 杨弘宾  wrote:
> 
> Hi,
> Supposing I'd like to matching 100 substructures with 1000 compounds 
> represented as smiles.
> What I did is:
> 
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l = len(suppl)
> for j in range(ll):  # I have to make substructures in the first loop.
> for i in range(l): 
> suppl[i].GetSubstructMatches(s[j]) 
> and found the performance is not good.
> 
> Then I did a comparison and found that it was because the conformation of the 
> compounds where not initiated.
> If I use MolFromSmiles,the performance will improve a lot.
> start = time.clock()
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') 
> l=len(suppl) 
> print time.clock()-start   # >>> 0.0373735355168  indicating that the 
> molecules were not initiated.
> for i in range(l): 
> suppl[i].GetSubstructMatches(sa) 
> suppl[i].GetSubstructMatches(sa2) 
> print time.clock()-start   # >>> 11.1884715172
> start = time.clock() 
> f = open('allmoleculenew.smi') 
> for i in range(l): 
> mol = Chem.MolFromSmiles(f.next().split('\t')[0]) 
> mol.GetSubstructMatches(sa) 
> mol.GetSubstructMatches(sa2)
> print time.clock()-start # >>> 5.44030582111
> 
> The second method was double faster than the first, indicating that the 
> "init" is more time consuming compared to matching.
> I think SmilesMolSupplier is a good API to load multiple compounds but it 
> didnot parse the smiles immediately, which adds the time complexity to the 
> further application. So is it possible to manually initiate the compounds?
> 
> Hongbin Yang 杨弘宾 
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy 
> East China University of Science and Technology 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure by atom indices

2016-11-01 Thread Maciek Wójcikowski
Hi,

There is PathToSubmol() although it takes the list of bonds. If you have
atom indices:

bonds = []
> atommap = {}

for i,j in combinations(atom_path, 2):
>  b = ParentMol.GetBondBetweenAtoms(i,j)
>  if b:
>bonds.append(b.GetIdx())

NewMol = Chem.PathToSubmol(ParentMol, bonds, atomMap=atommap)



atommap is a dictionary populated with atom indicies mapping from ParentMol
to the new one.



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-11-01 11:00 GMT+01:00 Juuso Lehtivarjo :

> Hi All,
>
> Is there a python function (or any simple way whatsoever) to create a
> substructure mol object from another one based on the given atom
> indices? In C++ this could apparently be done with
> getMolFragsWithQuery, but that does not seem to be much used in python
> wrappers...
>
> Best,
>Juuso
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Substructure by atom indices

2016-11-01 Thread Juuso Lehtivarjo
Hi All,

Is there a python function (or any simple way whatsoever) to create a
substructure mol object from another one based on the given atom
indices? In C++ this could apparently be done with
getMolFragsWithQuery, but that does not seem to be much used in python
wrappers...

Best,
   Juuso

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

2016-11-01 Thread Brian Kelley
A supplier is random access, so your call to supp[I] here is probably quite
expensive:

suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
l = len(suppl)
for j in range(ll):  # I have to make substructures in the first loop.
for i in range(l):
suppl[i].GetSubstructMatches(s[j])

I highly suggest using the python iteration as opposed to using an index
such as:

for mol in suppl:
  for pat in s:
  mol.GetSubstructMatches(pat)

I expect this will help quite a bit.  You may also consider using the
FilterCatalog which is designed to handle larger data sets and may help in
your case.

On Tue, Nov 1, 2016 at 4:22 AM, 杨弘宾  wrote:

> Hi,
> Supposing I'd like to matching 100 substructures with 1000 compounds
> represented as smiles.
> What I did is:
>
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l = len(suppl)
> for j in range(ll):  # I have to make substructures in the first loop.
> for i in range(l):
> suppl[i].GetSubstructMatches(s[j])
> and found the performance is not good.
>
> Then I did a comparison and found that it was because the conformation of
> the compounds where not initiated.
> If I use MolFromSmiles,the performance will improve a lot.
> start = time.clock()
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l=len(suppl)
> print time.clock()-start   # >>> 0.0373735355168  indicating that the
> molecules were not initiated.
> for i in range(l):
> suppl[i].GetSubstructMatches(sa)
> suppl[i].GetSubstructMatches(sa2)
> print time.clock()-start   # >>> 11.1884715172
> start = time.clock()
> f = open('allmoleculenew.smi')
> for i in range(l):
> mol = Chem.MolFromSmiles(f.next().split('\t')[0])
> mol.GetSubstructMatches(sa)
> mol.GetSubstructMatches(sa2)
> print time.clock()-start # >>> 5.44030582111
>
> The second method was double faster than the first, indicating that the
> "init" is more time consuming compared to matching.
> I think SmilesMolSupplier is a good API to load multiple compounds but it
> didnot parse the smiles immediately, which adds the time complexity to
> the further application. So is it possible to manually initiate the
> compounds?
>
> --
> Hongbin Yang 杨弘宾
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy
> East China University of Science and Technology
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Problem adding hydrogens to peptides

2016-11-01 Thread James Davidson
Dear All,

Enthused by all the great talks at the UGM, for the last couple of days I have 
been getting more hands-on with RDKit than I have in quite a while!
I was keen to work with some peptides/proteins in 3D, but am having some 
problems when adding hydrogens...

I have uploaded a GIST to demonstrate the issue (apologies - the py3Dmol js 
doesn't render in the nbviewer, but this doesn't affect understanding):
https://gist.github.com/jepdavidson/f5220187c18be0fc9e119f9da2e7d955

The main problem is that added hydrogens don't automatically get assigned 
monomer info from the monomer they are being added to, but there are other 
issues as well (the hydrogens are marked 'HETATM', the occupancy for the ATOM 
blocks are set to "-nan", and the CONECT block doesn't list the added Hs).

Propagating the monomer info from the amino acids to the added Hs isn't too 
difficult (can call atom.GetNeighbors() and take the info from the neighbouring 
atom) - but there are also some preferred (or required?) naming and numbering 
conventions to adhere to ("H" for the backbone NH, "HA" for the hydrogen on the 
alpha carbon, etc).

Perhaps I am missing something here (a secret 'flavour' option? :)) - but if 
not, it would be interesting to hear what behaviour others would expect when 
adding explicit hydrogens (I think the same issues will relate to any sequence 
where monomer information is present).

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

2016-11-01 Thread 杨弘宾






Hi,? ??Supposing I'd like to matching 100 substructures with 1000 compounds 
represented as smiles.What I did is:
suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = 
len(suppl)for j in range(ll): ?# I have to make substructures in the first 
loop.? ??for i in range(l):

? ??? ??suppl[i].GetSubstructMatches(s[j])?and found the performance is not 
good.
Then I did a comparison and found that it was because the conformation of the 
compounds where not initiated.If I use MolFromSmiles,the performance will 
improve a lot.start = time.clock()suppl = 
AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')

l=len(suppl)?print time.clock()-start ? # >>>?0.0373735355168 ?indicating that 
the molecules were not initiated.
for i in range(l):

? ??suppl[i].GetSubstructMatches(sa)

? ??suppl[i].GetSubstructMatches(sa2)

print time.clock()-start ? # >>>?11.1884715172
start = time.clock()

f = open('allmoleculenew.smi')

for i in range(l):

? ??mol = Chem.MolFromSmiles(f.next().split('\t')[0])

? ??mol.GetSubstructMatches(sa)

? ??mol.GetSubstructMatches(sa2)print time.clock()-start # >>>?5.44030582111
The second method was double faster than the first, indicating that the "init" 
is more time consuming compared to matching.I think?SmilesMolSupplier is a good 
API to load multiple compounds but it didnot parse the smiles immediately, 
which adds the?time complexity to the further application. So is it possible to 
manually initiate the compounds?


Hongbin Yang 杨弘宾

Research: Toxicophore and Chemoinformatics
Pharmaceutical Science, School of Pharmacy

East China University of Science and Technology?


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss