Re: [Rdkit-discuss] problems with installation on conda with python 3.5 32-bit
Hi Greg, I am using 32-bit python27 and anaconda (with 64-bit windows 10). So I cannot update to latest version and test as you proposed several days ago. Since it did not trouble me, I plan to upgrade all this environment in the future. BTW, is it necessary to upgrade python into 3.6 in case that RDkit won't support python2. I prefer 2.7 at least for now :) Hongbin Yang From: Greg LandrumDate: 2017-02-20 23:02To: Michal KrompiecCC: rdkit-discuss@lists.sourceforge.netSubject: Re: [Rdkit-discuss] problems with installation on conda with python 3.5 32-bitHi Michal, We've only ever done python2.7 builds for win32 and we stopped doing those with the 2016.03 release.I will have to check, but I think I probably can start doing these again, but I'm reluctant due to the amount of effort required.How many users do you need to support who are stuck on 32bit machines? -greg On Mon, Feb 20, 2017 at 2:18 PM, Michal Krompiecwrote: Hello,I can't install rdkit on anaconda with 32-bit python3 on Windows 7. When I try "the usual", conda tries to install python2.7 into the environment: >conda create -c rdkit -n my-rdkit-env rdkit Fetching package metadata . Solving package specifications: .Package plan for installation in environment C:\Anaconda3_32\envs\my-rdkit-env:The following NEW packages will be INSTALLED: boost: 1.56.0-py27_3 rdkit bzip2: 1.0.6-vc9_3 [vc9] mkl: 2017.0.1-0 numpy: 1.11.3-py27_0 pip: 9.0.1-py27_1 python: 2.7.13-0 rdkit: 2016.03.1-np111py27_1 rdkit setuptools: 27.2.0-py27_1 vs2008_runtime: 9.00.30729.5054-0 wheel: 0.29.0-py27_0 zlib: 1.2.8-vc9_3 [vc9] If I create an empty environment, load python 3.5 into it and try installing rdkit, I get an error: >conda create -n my-rdkit-env python=3.5 Fetching package metadata ... Solving package specifications: .Package plan for installation in environment C:\Anaconda3_32\envs\my-rdkit-env:The following NEW packages will be INSTALLED: pip: 9.0.1-py35_1 python: 3.5.2-0 setuptools: 27.2.0-py35_1 vs2015_runtime: 14.0.25123-0 wheel: 0.29.0-py35_0Proceed ([y]/n)?# # To activate this environment, use: # > activate my-rdkit-env # # To deactivate this environment, use: # > deactivate my-rdkit-env # # * for power-users using bash, you must source # >conda install --name my-rdkit-env -f --channel >https://conda.anaconda.org/rdkit rdkit Fetching package metadata . Solving package specifications: . UnsatisfiableError: The following specifications were found to be in conflict: - python 3.5* - rdkit -> python 2.7* Use "conda info " to see the dependencies for each package. I managed to install rdkit without any problems on the same machine in 64-bit anaconda with python3.5, but I need a separate 32-bit build to support users with 32-bit machines. Any help will be appreciated. Thanks and best regards, Michal -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] jupyter cracked when drawing with "abnormal" operation of rdMolDraw2D
Hi, greg, DrawMolecules did not work in my compute, I guess it is a future feature from my version (v2016.3.1), isn't it? And I tried your second suggestion but it still cracked. Maybe the old version of MolDraw2D is buggy. The following are fake compouns for test, and each one is drawable. # mycompounds.smiSMILESc1cc1[N+]([O-])=O1CCC[N+]([O-])=O 2CC[N+]([O-])=O 3C[N+]([O-])=O 4c1c(CC)1[N+]([O-])=O 5 # codefrom rdkit import Chemfrom rdkit.Chem.Draw import rdMolDraw2Dfrom rdkit.Chem import Drawfrom rdkit.Chem import rdDepictorfrom rdkit.Chem import rdMolTransformsmols = Chem.SmilesMolSupplier('mycompounds.smi',delimiter='\t')len(mols)mols = list(mols)def centerMol(mol): conf = mol.GetConformer() pt = rdMolTransforms.ComputeCentroid(conf) for i in range(conf.GetNumAtoms()): conf.SetAtomPosition(i,conf.GetAtomPosition(i) - pt) drawer = rdMolDraw2D.MolDraw2DSVG(400,400) smarts = Chem.MolFromSmarts('N(-O)=O')i=0 for mol in mols: if not mol.HasSubstructMatch(smarts): continue rdDepictor.Compute2DCoords(mol) tm = Chem.Mol(mol) centerMol(tm) Draw.PrepareMolForDrawing(tm) #drawer.DrawMolecule(mol,highlightAtoms=mol.GetSubstructMatch(p)) drawer.DrawMolecule(tm,highlightAtoms=tm.GetSubstructMatch(smarts)) drawer.FinishDrawing() svg = drawer.GetDrawingText().replace('svg:','') Hongbin Yang From: Greg LandrumDate: 2017-02-18 12:59To: 杨弘宾CC: rdkit-discussSubject: Re: [Rdkit-discuss] jupyter cracked when drawing with "abnormal" operation of rdMolDraw2DHi, On Fri, Feb 17, 2017 at 6:31 AM, 杨弘宾 <yanyangh...@163.com> wrote: Hi, everyone, I want to draw two molecules in a svg file with rdMolDraw2D. When I executed the following code, the jupyter cracked without any error or warning.```drawer = rdMolDraw2D.MolDraw2DSVG(400,400) i=0 for mol in mols: if mol.HasSubstructMatch(smarts): rdDepictor.Compute2DCoords(mol) #if i == 1: # continue drawer.DrawMolecule(mol,highlightAtoms=mol.GetSubstructMatch(smarts)) i+=1 if i > 1: break drawer.FinishDrawing() svg = drawer.GetDrawingText().replace('svg:','') SVG(svg)``` It seems that we cannot directly draw two molecules with the same drawer? So how can I draw as I wanted? Do you want to draw the molecules on top of each other (somewhat problematic at the moment, but doable) or in a grid?If you want to have them in a grid, the solution is: drawer = rdMolDraw2D.MolDraw2DSVG(400,400,200,200) p = Chem.MolFromSmarts('c1n1')drawer.DrawMolecules(mols[:4])drawer.FinishDrawing() svg = drawer.GetDrawingText().replace('svg:','') That's an overall image size of 400x400 with 200x200 panes for the individual molecules. At the moment molecular highlighting does not work when you do this (there's a github item for that here: https://github.com/rdkit/rdkit/issues/1323) If you want to put them on top of each other, what you show above should kind of work. You probably should center each of the molecules first though: from rdkit.Chem import rdMolTransformsdef centerMol(mol): conf = mol.GetConformer() pt = rdMolTransforms.ComputeCentroid(conf) for i in range(conf.GetNumAtoms()): conf.SetAtomPosition(i,conf.GetAtomPosition(i) - pt) drawer = rdMolDraw2D.MolDraw2DSVG(400,400) p = Chem.MolFromSmarts('c1n1')i=0 for mol in mols: tm = Chem.Mol(mol) centerMol(tm) Draw.PrepareMolForDrawing(tm) #drawer.DrawMolecule(mol,highlightAtoms=mol.GetSubstructMatch(p)) drawer.DrawMolecule(tm,highlightAtoms=tm.GetSubstructMatch(p)) i+=1 if i > 1: break drawer.FinishDrawing() svg = drawer.GetDrawingText().replace('svg:','') Note that this can end up being somewhat ugly since the drawing code will determine the scaling factors to make the molecules fit in the canvas from the first molecule. I think it is fixable by adding some additional logic to DrawMolecules() but I'm going to have to look into it. that's this item in github: https://github.com/rdkit/rdkit/issues/1325 By the way, I've no idea why it cracked. From the experiment of the commented code, I can conclude it was caused by the drawer. So is it possible to fix the bug, adding error or warning instead of "KernelRestarter: restarting kernel" in console. I can't reproduce a crash there. Which version of the RDKit are you using? -greg -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] jupyter cracked when drawing with "abnormal" operation of rdMolDraw2D
Hi, everyone,? ??I want to draw two molecules in a svg file with rdMolDraw2D. When I executed the following code, the jupyter cracked without any error or warning.```drawer = rdMolDraw2D.MolDraw2DSVG(400,400) i=0 for mol in mols: ? if mol.HasSubstructMatch(smarts): ? ? rdDepictor.Compute2DCoords(mol)?? ? #if i == 1:? ? # ?continue ? ? drawer.DrawMolecule(mol,highlightAtoms=mol.GetSubstructMatch(smarts)) ? ? i+=1 ? ? if i > 1: ? ? ? break drawer.FinishDrawing() svg = drawer.GetDrawingText().replace('svg:','') SVG(svg)``` It seems that we cannot directly draw two molecules with the same drawer? So how can I draw as I wanted? By the way, I've no idea why it cracked. From the experiment of the commented code, I can conclude it was caused by the drawer. So is it possible to fix the bug, adding error or warning instead of "KernelRestarter: restarting kernel" in console. Hongbin Yang? -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Error when building RdKit 2013_09_1
Hi, Menaka, An easy way I suggest is to install boost via package manager, for example: $ sudo apt-get install libboost-dev libboost-system-dev libboost-thread-dev libboost-serialization-dev libboost-python-dev libboost-regex-dev When installing boost from source, remember to add python-library $ ./bootstrap.sh --with-libraries=python,regex; ./b2; ./b2 install reference: http://www.rdkit.org/docs/Install.html#ubuntu-12-04-and-later hope it is helpful PS Other troubles may cause the error, for example:Problem:Your system has a version of boost installed in /usr/lib, but you would like to force the RDKit to use a more recent one.Solution:This can be solved by using cmake version 2.8.3 (or more recent) and providing the -D Boost_NO_SYSTEM_PATHS=ON argument:cmake -D BOOST_ROOT=/usr/local -D Boost_NO_SYSTEM_PATHS=ON .. reference: http://www.rdkit.org/docs/Install.html#frequently-encountered-problems Hope it is helpful. Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology From: Menaka JayawardenaDate: 2016-09-06 00:05To: rdkit-discussSubject: [Rdkit-discuss] Error when building RdKit 2013_09_1Hello, When I try to build the above version of RDKit, I got the following error. CMake Error at /usr/share/cmake-2.8/Modules/FindBoost.cmake:1131 (message): Unable to find the requested Boost libraries. Boost version: 1.55.0 Boost include path: /home/menaka/Downloads/boost_1_55_0/include Could not find the following Boost libraries: boost_python No Boost libraries were found. You may need to set BOOST_LIBRARYDIR to the directory containing Boost libraries or BOOST_ROOT to the location of Boost.Call Stack (most recent call first): CMakeLists.txt:113 (find_package) I'd be very grateful if someone could help me in solving this issue. Best regardsMenaka -- Menaka Madushanka JayawardenaFaculty of Engineering,University of Peradeniyaya.LinkedIn TP:- 071 885 1183/ 071 350 5470 -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] a SMILES that rdkit cannot read
hi,? ? When I used rdkit to parse a smi file, I found that there was a SMILES that rdkit cannot parse, and no any error or?warning. version: Release_2016_03_1>>>?mol = Chem.MolFromSmiles('N(CC(O)C1=C\C(=N/#N)\C(=O)C=C1)N=O')>>> print molNone This compound can be read by OpenBabel. But I have no idea why it didnot work. Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology? -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] a SMILES that rdkit cannot read
Thanks. It's really a obvious error.I ran it in jupyter via browser. Both docker/ubuntu and windows were tested and found that jupyter won't give error. Ipython or python in terminal(cmd) would show this error. Why not raise a python error or warning? Hongbin Yang From: Greg LandrumDate: 2016-09-19 12:08To: 杨弘宾CC: rdkit-discussSubject: Re: [Rdkit-discuss] a SMILES that rdkit cannot readHi, On Mon, Sep 19, 2016 at 5:33 AM, 杨弘宾 <yanyangh...@163.com> wrote: When I used rdkit to parse a smi file, I found that there was a SMILES that rdkit cannot parse, and no any error or warning. version: Release_2016_03_1>>> mol = Chem.MolFromSmiles('N(CC(O)C1=C\C(=N/#N)\C(=O)C=C1)N=O')>>> print molNone This compound can be read by OpenBabel. But I have no idea why it didnot work. I don't know why OpenBabel accepts it, but there's definitely an error in that SMILES. This part: "=N/#N" has a single bond (the "/") directly followed by a triple bond (the "#"), and that's not legal SMILES. I do see an error message when I try the SMILES:In [2]: mol = Chem.MolFromSmiles('N(CC(O)C1=C\C(=N/#N)\C(=O)C=C1)N=O')[06:04:54] SMILES Parse Error: syntax error for input: 'N(CC(O)C1=C\C(=N/#N)\C(=O)C=C1)N=O'and find it a bit odd that you don't. What operating system are you using and how are you running the code? Best,-greg -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] The fragmentMatcher (SubstructMatcher) is not as good as expected
Hi,? ? I tryied using rdkit to match fragments with compounds only to find that rdkit performed not well in SMARTS. The following is the notebook I worked. from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import FragmentMatcher from rdkit.Chem.Draw import IPythonConsole In?[49]:p = FragmentMatcher.FragmentMatcher() p.Init('[a!r0][NX3+](=[OX1])([O-])') In?[50]:mol = Chem.MolFromSmiles('c1c1[N+](=O)[O-]') mol Out[50]:In?[51]:p.HasMatch(mol) Out[51]:0In?[52]:print Chem.MolFromSmarts('[a!r0][NX3+](=[OX1])([O-])') None However, openbabel worked well in matching the substrcutre. Even "or operator" was avaiable such as "[a!r0][$([NX3+](=[OX1])([O-])),$([NX3](=O)=O)]".? >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> a=pybel.readstring('smi','c1c1[N+](=O)[O-]') >>> s.findall(a) [(6, 7, 8, 9)] It is a pity that rdkit can calculate the topological distance between two atoms while it cannot match the fragments... Is there any better API which I didn't find? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology? -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The fragmentMatcher (SubstructMatcher) is not as good as expected
Thanks, it works! I appreciate that Rdkit is so strict in representation of the the molecules and the substructures. I learned a lot in the mail list. Hongbin Yang From: Paolo ToscoDate: 2016-10-27 17:19To: 杨弘宾; rdkit-discussSubject: Re: [Rdkit-discuss] The fragmentMatcher (SubstructMatcher) is not as good as expected Dear Hongbin, I am afraid The SMARTS you are using is not valid, as no SSSR can have less than 3 terms, or it wouldn't be a ring. If you change[a!r0] into, for instance, [a!r3], then you'll find the match you are looking for. Cheers, p. On 27/10/2016 09:36, 杨弘宾 wrote: Hi, I tryied using rdkit to match fragments with compounds only to find that rdkit performed not well in SMARTS. The following is the notebook I worked. from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import FragmentMatcher from rdkit.Chem.Draw import IPythonConsole In [49]: p = FragmentMatcher.FragmentMatcher() p.Init('[a!r0][NX3+](=[OX1])([O-])') In [50]: mol = Chem.MolFromSmiles('c1c1[N+](=O)[O-]') mol Out[50]: In [51]: p.HasMatch(mol) Out[51]: 0 In [52]: print Chem.MolFromSmarts('[a!r0][NX3+](=[OX1])([O-])') None However, openbabel worked well in matching the substrcutre. Even "or operator" was avaiable such as "[a!r0][$([NX3+](=[OX1])([O-])),$([NX3](=O)=O)]". >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> a=pybel.readstring('smi','c1c1[N+](=O)[O-]') >>> s.findall(a) [(6, 7, 8, 9)] It is a pity that rdkit can calculate the topological distance between two atoms while it cannot match the fragments... Is there any better API which I didn't find? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 2D drawing with atoms labeled by index
Hi,Peter S. Shenkin, I think this blog may help you draw molecule with labels and it told more about drawing with rdMolDraw2D. http://rdkit.blogspot.com/2015/02/new-drawing-code.html Hongbin Yang From: Peter S. ShenkinDate: 2016-10-24 10:18To: Dimitri Maziuk; RDKit DiscussSubject: [Rdkit-discuss] 2D drawing with atoms labeled by indexHi, How do you get RDKit to label the atoms in a 2D drawing with their indices? There was some discussion of this that included Dimitri Maziuk in September, but it wasn't clear to me whether he actually had to modify the underlying drawing code to get this behavior. -P. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index
Hi, Peter, I don't know whether it can help you since I did not repeat your code. But it acturally works in my computer: change the extended name from .svg into .html and open it via chrome. It should be valid with svg2.svg (the namespace of svg were removed). Hongbin Yang From: Peter S. ShenkinDate: 2016-10-25 13:27To: Dmitri MaziukCC: RDKit DiscussSubject: Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by indexHi, Dima wrote:Try saving the text (svg/svg2) to a file and opening it in chrome (if you can actually open a file in chrome) or some other application. I actually did that, and in a second email I reported:Chrome thinks svg.svg is emptyWhen I load svg2.svg, Chrome complains, "This XML file does not appear to have any style information associated with it. The document tree is shown below" -P. On Tue, Oct 25, 2016 at 12:28 AM, Dmitri Maziukwrote: OK, my turn: that went out too soon. It seems to me that jypiter, ipython, or whatever, has no idea how render MIME type image/svg+xml. It can display an "SVG" object, but the bit that turned image/svg+xml into "SVG" does not understand XML namespaces (that's been around since at least 2009). Try saving the text (svg/svg2) to a file and opening it in chrome (if you can actually open a file in chrome) or some other application. Dima -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.
Hi,? ??Supposing I'd like to matching 100 substructures with 1000 compounds represented as smiles.What I did is: suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = len(suppl)for j in range(ll): ?# I have to make substructures in the first loop.? ??for i in range(l): ? ??? ??suppl[i].GetSubstructMatches(s[j])?and found the performance is not good. Then I did a comparison and found that it was because the conformation of the compounds where not initiated.If I use MolFromSmiles,the performance will improve a lot.start = time.clock()suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') l=len(suppl)?print time.clock()-start ? # >>>?0.0373735355168 ?indicating that the molecules were not initiated. for i in range(l): ? ??suppl[i].GetSubstructMatches(sa) ? ??suppl[i].GetSubstructMatches(sa2) print time.clock()-start ? # >>>?11.1884715172 start = time.clock() f = open('allmoleculenew.smi') for i in range(l): ? ??mol = Chem.MolFromSmiles(f.next().split('\t')[0]) ? ??mol.GetSubstructMatches(sa) ? ??mol.GetSubstructMatches(sa2)print time.clock()-start # >>>?5.44030582111 The second method was double faster than the first, indicating that the "init" is more time consuming compared to matching.I think?SmilesMolSupplier is a good API to load multiple compounds but it didnot parse the smiles immediately, which adds the?time complexity to the further application. So is it possible to manually initiate the compounds? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology? -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.
Hi, Brian, The first point you mentioned was acturally what I guessed and it is deprecated in my context, I think. Thanks for the second suggestion, I tried this and the performance improved: suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = len(suppl) # This line is crucialsuppl = list(suppl) And the types of suppl are repectively: , , So, though the second suppl (after len(suppl) ) is selectable, it was not a list indeed. It is amazing that the all molecules were instantiated after the `list` operator. : ) Hongbin Yang From: Brian KelleyDate: 2016-11-01 19:56To: 杨弘宾CC: rdkit-discussSubject: Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.I'll make two more points ( thanks to Greg Landrum for pointing this out ) 1). In your code each call to suppl[i] makes a new molecule, calling it twice in a row is twice as slow. This explains your last result. 2) in my example, I was assuming that the queries were already in a python list and not from a supplier. If they are being read from a supplier, you can easily keep them all in memory with: queries = list(query_supplier) Note that for large files, this can take up a lot of memory. Thanks for the clarification Greg. Brian Kelley On Nov 1, 2016, at 4:22 AM, 杨弘宾 <yanyangh...@163.com> wrote: Hi, Supposing I'd like to matching 100 substructures with 1000 compounds represented as smiles.What I did is: suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = len(suppl)for j in range(ll): # I have to make substructures in the first loop. for i in range(l): suppl[i].GetSubstructMatches(s[j]) and found the performance is not good. Then I did a comparison and found that it was because the conformation of the compounds where not initiated.If I use MolFromSmiles,the performance will improve a lot.start = time.clock()suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') l=len(suppl) print time.clock()-start # >>> 0.0373735355168 indicating that the molecules were not initiated. for i in range(l): suppl[i].GetSubstructMatches(sa) suppl[i].GetSubstructMatches(sa2) print time.clock()-start # >>> 11.1884715172 start = time.clock() f = open('allmoleculenew.smi') for i in range(l): mol = Chem.MolFromSmiles(f.next().split('\t')[0]) mol.GetSubstructMatches(sa) mol.GetSubstructMatches(sa2)print time.clock()-start # >>> 5.44030582111 The second method was double faster than the first, indicating that the "init" is more time consuming compared to matching.I think SmilesMolSupplier is a good API to load multiple compounds but it didnot parse the smiles immediately, which adds the time complexity to the further application. So is it possible to manually initiate the compounds? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] property of name in smilesMolSupplier
Hi,? ? I spent a lot of time to explorer "How to get the property of name when using SmilesMolSupplier" I had a smiles file like this:===c1c1\t1\n...===where 1 means that it is positve So I wanted to read this smiles file via SMilesMolSupplier and I knew that the second column is the default name column.suppl = Chem.SmilesMolSupplier('compounds.smi',delimiter='\t',titleLine=False) However, I could not get the property of 1 because I had no idea what the property_name was.In the document, it shows that :If the input file has a title line and more than two columns (smiles and id), the additional columns will be used to set properties on each molecule. The properties are accessible using the mol.GetProp(propName) method.But It was a "no title table", So I thought its property_name of "their names" should be "Name" or "name" as default. And I failed...After read the source code, I found that it should be _Namehttps://github.com/rdkit/rdkit/blob/f4529c910e546af590c56eba01f96e9015c269a6/Code/GraphMol/FileParsers/SmilesMolSupplier.cpp#L194?I think the document should be improved so that others may know how to get the name of each compoundBTW, I tried to use suppl[0].GetPropNames() but only to get ",class std::allocator > at 0x7402e90>" that seemed tell nothing. I wondered that is there any way to make it readable in python? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology? -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] property of name in smilesMolSupplier
Hi Paolo,?? ? Thank you for your answer. I tried and found an interesting issue: >>> mol = suppl[1] >>> print mol.GetProp('_Name') >>> print list(mol.GetPropNames())---output---1[] I guess that the _Name property is a hidden variable. ?Maybe the user cannot find this property_name untill he read the source code, or define the property_name through adding headers. Hongbin Yang 杨弘宾 ?From:?Paolo ToscoDate:?2016-10-13?18:56To:?杨弘宾; rdkit-discussSubject:?Re: [Rdkit-discuss] property of name in smilesMolSupplier Hi Hongbin, suppl[0].GetPropNames() is an interable object, so you can use it in for loops such as: for i in suppl[0].GetPropNames(): ? print (i) or you may convert it to a list: l = list(suppl[0].GetPropNames()) print (l) Cheers, p. On 10/13/16 11:31, 杨弘宾 wrote: Hi, ? ? I spent a lot of time to explorer "How to get the property of name when using SmilesMolSupplier" I had a smiles file like this: === c1c1\t1\n ... === where 1 means that it is positve So I wanted to read this smiles file via SMilesMolSupplier and I knew that the second column is the default name column. suppl = Chem.SmilesMolSupplier('compounds.smi',delimiter='\t',titleLine=False) However, I could not get the property of 1 because I had no idea what the property_name was. In the document, it shows that : If the input file has a title line and more than two columns (smiles and id), the additional columns will be used to set properties on each molecule. The properties are accessible using the mol.GetProp(propName) method. But It was a "no title table", So I thought its property_name of "their names" should be "Name" or "name" as default. And I failed... After read the source code, I found that it should be _Name https://github.com/rdkit/rdkit/blob/f4529c910e546af590c56eba01f96e9015c269a6/Code/GraphMol/FileParsers/SmilesMolSupplier.cpp#L194? I think the document should be improved so that others may know how to get the name of each compound BTW, I tried to use suppl[0].GetPropNames() but only to get ",class std::allocator > at 0x7402e90>" that seemed tell nothing. I wondered that is there any way to make it readable in python? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology? -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Cannot import rdBase after installed rdkit by source in a non-administrator linux cluster
Hi, rdkiters, ? ? Have you tried install rdkit from source? It's ok when I installed rdkit by conda in my PC. But when I tried installing it in a server in which I am only a user who cannot use "sudo" and the "python" is in a read-only directory. Here is my cmake command:`~applic/cmake/bin/cmake -D PYTHON_LIBRARY=/home/yccai/Programs/Anaconda/lib/python2.7/config/libpython2.7.a -D PYTHON_INCLUDE_DIR=/home/yccai/Programs/Anaconda/include/python2.7 -D PYTHON_EXECUTABLE=/home/yccai/Programs/Anaconda/bin/python -D BOOST_ROOT=/home/yccai/Programs/Anaconda -D Boost_NO_SYSTEM_PATHS=ON ..` And output: -- The C compiler identification is GNU 4.1.2 -- The CXX compiler identification is GNU 4.1.2 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of unsigned short -- Check size of unsigned short - done -- Using unsigned short -- Check if the system is big endian - little endian -- Found PythonInterp: /home/yccai/Programs/Anaconda/bin/python (found version "2.7.12") -- Found PythonLibs: /home/yccai/Programs/Anaconda/lib/python2.7/config/libpython2.7.a (found version "2.7.12") -- Boost version: 1.56.0 -- Found the following Boost libraries: -- python -- Could NOT find Eigen3 (missing: EIGEN3_INCLUDE_DIR EIGEN3_VERSION_OK) (Required is at least version "2.91.0") Eigen3 not found, disabling the Descriptors3D build. -- Looking for include file pthread.h -- Looking for include file pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Boost version: 1.56.0 -- Found the following Boost libraries: -- thread -- system -- Boost version: 1.56.0 -- Found the following Boost libraries: -- serialization == Using strict rotor definition == Updating Filters.cpp from pains file == Done updating pains files -- Boost version: 1.56.0 -- Found the following Boost libraries: -- regex -- Configuring done -- Generating done -- Build files have been written to: /home/hbyang/applic/rdkit-Release_2016_09_4/build? There was no error in `make` and `make install`. But when I used:`from rdkit import rdBase`error happened:ImportError: /home/yccai/Programs/Anaconda/bin/../lib/libboost_serialization.so.1.56.0: undefined symbol:?_ZN5boost13serialization6detail17singleton_wrapperINS_7archive6detail12extra_detail3mapINS3_15binary_oarchive14m_is_destroyedE I tried the older version of rdkit and got similar error (libboost_python.so.1.56.0). I don't think the problem is in the boost in conda but.. ? So what can I do to install it (or find where is the problem)? I want to use rdkit to draw molecules in my tool. Is there any alternative way to do so? Hongbin Yang? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] delete a substructure
网易邮箱 Hi Chemyang, Your issue was caused by the definition of "-OH(phenol)", I think. If you define this pattern as "cO", the atom 3 will be matched since it is the aromatic carbon bond to an oxygen. I guess you just wanted to match exactly the oxygen and restrict it with "bonding with an aromatic carbon". So the SMARTS should ber "[$(Oc)]", which indicates an oxygen with the environment of "bonding with an aromatic carbon". m = Chem.MolFromSmiles('CC1=CC(=C(C=C1)C(=O)O)O') m.GetSubstructMatches(Chem.MolFromSmiles('[$(Oc)]')) >>> ((10,),) Then only atom 10 will be matched and it won't interfere with other counts. Reference: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html 4.4 Hongbin Yang From: Chenyang ShiDate: 2017-03-09 01:32To: Greg LandrumCC: rdkit-discuss; 杨弘宾Subject: Re: [Rdkit-discuss] delete a substructure 网易邮箱 Dear Hongbin, I tried your method on a molecule, 4-Methylsalicylic acid (CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in Joback method (using SMARTS), and used m.GetSubstructMatches to print out all atom positions. The result is summarized in the table. We can see there are duplicated counts--coming from COOH group. As suggested by Hongbin, we can remove duplicated atoms by looking at their positions--in this case, ((9),), ((7,8,),), ((7,),), and ((8,),) are subsets of ((7,8,9)) from -COOH. Indeed we can get rid of these duplicates. However, I also noticed that Atom (3,) from =C< (ring) group is also a part of -OH (phenol) ((10,3),). If we apply the same algorithm to remove duplicates, the =C<(ring) group will be only counted twice instead of three times. Greg, you mentioned as an alternative I can delete substructure using chemical reaction method. It would be greatly appreciated if you could show me (point me to) a simple example code, perhaps on a simple molecule? I find myself at a loss when browsing the manual. I would like to try also in that direction. Thanks,Chenyang On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum <greg.land...@gmail.com> wrote: The solution that Hongbin proposes to the double-counting problem is a good one. Just be sure to sort your substructure queries in the right order so that the more complex ones come first. Another thing you might think about is making your queries more specific. For example, as you pointed out "[OH]" is very general and matches parts of carboxylic acids and a number of other functional groups. The RDKit has a set of fairly well tested (though certainly not perfect) functional group definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol definition from there looks like this:[O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])] -greg On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 <yanyangh...@163.com> wrote: Hi, Chenyang, You don't need to delete the substructure from the molecule. Just check whehter the mapped atoms have been matched. For example: m = Chem.MolFromSmiles('CC(=O)O')OH = Chem.MolFromSmarts('[OH]')COOH = Chem.MolFromSmarts('C(O)=O') m.GetSubstructMatches(OH)>> ((3,),)m.GetSubstructMatchs(COOH)>> ((1, 3, 2),) Since atom "3" has been already matched, it should be ignored. So you can create a "set" to record the matched atoms to avoid repetitive count. Hongbin Yang 杨弘宾 From: Chenyang ShiDate: 2017-03-06 14:04To: Greg LandrumCC: RDKit DiscussSubject: Re: [Rdkit-discuss] delete a substructureHi Greg, Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns correct numbers of substructures for CH3COOH. The potential problem with this approach is that if the molecule is getting complicated, it will possibly generate duplicate numbers for certain functional groups. For example, --OH (alcohol) group will be likely also counted in --COOH. A safer way, in my mind, is to remove the substructure that has been counted. Greg, you mentioned "chemical reaction functionality", can you show me a demo script with that using CH3COOH as an example. I will definitely delve into the manual to learn more. But reading your code will be a good start. Thanks,Chenyang On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum <greg.land...@gmail.com> wrote: Hi Chenyang, If you're really interested in counting the number of times the substructure appears, you can do that much quicker with `GetSubstructMatches()`: In [2]: m = Chem.MolFromSmiles('CC(C)CCO')In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]'))) Out[3]: 2 Is that sufficient, or do you actually want to sequentially remove all of the groups in your list? If you actually want to remove them, you are probably better off using the chemical reaction functionality instead of DeleteSubstructs(), which recalculates the number of implicit Hs on atoms after each call. -greg On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi <c
Re: [Rdkit-discuss] delete a substructure
Hi, Chenyang, You don't need to delete the substructure from the molecule. Just check whehter the mapped atoms have been matched. For example: m = Chem.MolFromSmiles('CC(=O)O')OH = Chem.MolFromSmarts('[OH]')COOH = Chem.MolFromSmarts('C(O)=O') m.GetSubstructMatches(OH)>> ((3,),)m.GetSubstructMatchs(COOH)>> ((1, 3, 2),) Since atom "3" has been already matched, it should be ignored. So you can create a "set" to record the matched atoms to avoid repetitive count. Hongbin Yang 杨弘宾 From: Chenyang ShiDate: 2017-03-06 14:04To: Greg LandrumCC: RDKit DiscussSubject: Re: [Rdkit-discuss] delete a substructureHi Greg, Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns correct numbers of substructures for CH3COOH. The potential problem with this approach is that if the molecule is getting complicated, it will possibly generate duplicate numbers for certain functional groups. For example, --OH (alcohol) group will be likely also counted in --COOH. A safer way, in my mind, is to remove the substructure that has been counted. Greg, you mentioned "chemical reaction functionality", can you show me a demo script with that using CH3COOH as an example. I will definitely delve into the manual to learn more. But reading your code will be a good start. Thanks,Chenyang On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum <greg.land...@gmail.com> wrote: Hi Chenyang, If you're really interested in counting the number of times the substructure appears, you can do that much quicker with `GetSubstructMatches()`: In [2]: m = Chem.MolFromSmiles('CC(C)CCO')In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]'))) Out[3]: 2 Is that sufficient, or do you actually want to sequentially remove all of the groups in your list? If you actually want to remove them, you are probably better off using the chemical reaction functionality instead of DeleteSubstructs(), which recalculates the number of implicit Hs on atoms after each call. -greg On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi <cs3...@columbia.edu> wrote: I am new to rdkit but I am already impressed by its vibrant community. I have a question regarding deleting substructure. In the RDKIT documentation, this is a snippet of code describing how to delete substructure: >>>m = Chem.MolFromSmiles("CC(=O)O")>>>patt = >>>Chem.MolFromSmarts("C(=O)[OH]")>>>rm = AllChem.DeleteSubstructs(m, >>>patt)>>>Chem.MolToSmiles(rm)'C' This block of code first loads a molecule CH3COOH using SMILES code, then defines a substructure COOH using SMARTS code which is to be deleted. After final line of code, the program outputs 'C', in SMILES form. I had wanted to develop a method for detecting number of groups in a molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by using their respective SMARTS code with no problem. However, when molecule becomes more complicated, it is preferred to delete the substructure that has been searched before moving to next search using SMARTS code. Well, in current case, after searching -COOH group and deleting it, the leftover is 'C' which is essentially CH4 instead of --CH3. I cannot proceed with searching with SMARTS code for --CH3 ([CH3;A;X4!R]). Is there any way to work around this?Thanks,Chenyang -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Chem.MolToSmarts(mol) out put
Hi, Alexis, Why don’t you output the molecule as “Smiles” format since the “Smarts” you wanted to output was really a molecule. And it is possible to remove the dummy atom via mol object. For example: In [26]: x = Chem.MolFromSmiles('CC*') In [27]: x.GetSubstructMatch(Chem.MolFromSmiles('*')) Out[27]: (2,) In [28]: mw = Chem.RWMol(x) In [29]: mw.RemoveAtom(2) In [30]: Chem.MolToSmiles(mw) Out[30]: 'CC' But I don’t know whether it can handle SMARTS. Hope that it is helpful for you. Hongbin Yang > On 14 Jul 2017, at 4:59 PM, Alexis Parenty> wrote: > > <>Dear Rdkiters, > > > > I sometimes get smarts from mol in atomic number notation such as: > > [#6]-[#7+]1=[#6]2-[#6]3:[#7]:[#6]:[#6]:[#6]:[#6]:3-[#6]3:[#6]:[#6]:[#6]:[#6]:[#6]:3-[#7]-2-[#6]-[#6]-1 > > > > Is there a way to force the method Chem.MolToSmarts(mol) to output a smarts > using alphabetic letters instead of atomic numbers? > > The reason I am asking is because I need to remove dummy atoms [*] from > smarts using string manipulation and I would rather use only one method. > Unless I can do that from the mol object? > > > > Many thanks, > > > > Alexis > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! > http://sdm.link/slashdot___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss