Re: [Rdkit-discuss] Clustering - visualization?
Thanks, Curt! I'll give those a look. It'll give me a very good reason to start digging into SciPy a bit more and exploit the added functionality that will bring. Regarding my original question and for anyone else that might be interested... I did indeed find an answer through a lot of code dredging. I found the Murtagh.ClusterData() function in RDKit, and was able to generate clusters from that. The function returns a single member list, that single member being a Cluster object. I can feed that object to ClusterVis.ClusterToImg to get the dendrogram I wanted. Here's a short code snip showing the pieces. ... c_tree = Murtagh.ClusterData(dists,nfps,Murtagh.WARDS,isDistData=True) ... rdkit.ML.Cluster.ClusterVis.ClusterToImg(c_tree[0], size=(500,500), fileName='test.png') ... I can then break the cluster tree into subtrees: ... rdkit.ML.Cluster.ClusterUtils.SplitIntoNClusters(c_tree[0], 5) ... And I've written a short function to extract out the individual structure memberships for each group: ... groups = ClusterUtils.SplitIntoNClusters(c_tree[0], 5) def GetGroupMembers( grp, memberlist=[] ): for child in grp.GetChildren(): if (child.GetData() is None ): GetGroupMembers( child, memberlist ) else: memberlist.append( child.GetData() ) return memberlist print GetGroupMembers(groups[0]) On Sat, May 14, 2016 at 11:21 AM, Curt Fischer <curt.r.fisc...@gmail.com> wrote: > Hi Robert, > > For the number of molecules you are interested in, it's viable to use > SciPy / NumPy clustering functions instead of rdkit's built in C-linked > functions. This approach will probably not be as fast rdkit's built-in > clustering functionalities, and will probably not scale to tens of > thousands of molecules as well as rdkit's functions, but if you use SciPy > or NumPy in other types of technical computing, this approach may be more > transparent, generalizable, and easier to use. > > I have an example Jupyter notebook in GitHub that describes what I mean; > here are the GitHub and nbviewer links: > > > https://github.com/tentrillion/ipython_notebooks/blob/master/chemical_similarity_in_python.ipynb > > https://nbviewer.jupyter.org/github/tentrillion/ipython_notebooks/blob/master/chemical_similarity_in_python.ipynb > > Here are some of the most important parts of the code for generating a > dendrogram. > > 1. Generate a numpy fingerprint matrix from a list of rdkit Molecules. > > for smiles in smiles_list: > mol = Chem.MolFromSmiles(smiles) > mols.append(mol) > fingerprint_mat = np.vstack(np.asarray(rdmolops.RDKFingerprint(mol, fpSize = > 2048), dtype = 'bool') for mol in mols) > > > 2. Generate the distance matrix. *pdist* and *squareform* are from > *scipy.spatial.distance*. > > dist_mat = pdist(fingerprint_mat, 'jaccard') dist_df = pd.DataFrame( > squareform(dist_mat), index = smiles_list, columns= smiles_list) > > As far as I can tell, the Jaccard distance is equivalent to one minus the > Tanimoto similarity. > > 3. Perform hierarchical clustering on the distance matrix and show the > dendrogram (see the github notebook for the plot). *hc* is > *scipy.cluster.hierarchy*. > > z = hc.linkage(dist_mat)dendrogram = hc.dendrogram(z, labels=dist_df.columns, > leaf_rotation=90)plt.show() > > > A helpful page for dendrograms using SciPy is this one: > https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/ > > Good luck! > > Curt > > On Sat, May 14, 2016 at 9:11 AM, Robert DeLisle <rkdeli...@gmail.com> > wrote: > >> Next up is clustering... >> >> I've got about 350 structures to cluster and I've worked through the >> example code from the RDKit Cookbook ( >> http://www.rdkit.org/docs/Cookbook.html#clustering-molecules). All >> seems well and good there, but I would like to see the dendrogram. I see >> that there is a ClusterVis module to generate images, PDF, and SVG, but all >> require a Cluster object as input. I don't find anywhere a description of >> acquiring or building that object based upon the results of clustering. >> >> Any tips? >> >> -Kirk >> >> >> >> >> -- >> Mobile security can be enabling, not merely restricting. Employees who >> bring their own devices (BYOD) to work are irked by the imposition of MDM >> restrictions. Mobile Device Manager Plus allows you to control only the >> apps on BYO-devices by containerizing them, leaving personal data >> untouched! >> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j >> ___ >> Rdkit-discuss mailin
[Rdkit-discuss] Clustering - visualization?
Next up is clustering... I've got about 350 structures to cluster and I've worked through the example code from the RDKit Cookbook ( http://www.rdkit.org/docs/Cookbook.html#clustering-molecules). All seems well and good there, but I would like to see the dendrogram. I see that there is a ClusterVis module to generate images, PDF, and SVG, but all require a Cluster object as input. I don't find anywhere a description of acquiring or building that object based upon the results of clustering. Any tips? -Kirk -- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GetSubstructMatch vs MMFFOptimize
RDKitters, I'm working on a project in which I want to align a collection of structures with their most similar structures and display the results in PyMOL. To accomplish this, I've built a Python script similar to the one attached here in which I start with pairs of structures, find the MCS of those structures, create a template based on the MCS and a 3D conformation of the structure of interest, and then generate a constrained conformation of a query structure. I tried to comment the attached code enough to lead you through the process. What I find is that quite often, the ConstrainedEmbed() function fails with the error "molecule doesn't match the core" which seems very odd since the pairs for which it fails are very similar. The attached .png shows one such pair and their MCS. What I've found is that when I generate a 3D conformation for the first structure and optimize it with MMFF (MMFFOptimize), this often causes GetSubstructMatch to fail finding the MCS within the structure. If instead I used UFFOptimize, everything seems to work OK most of the time. In my code, I've noted where the error occurs and flanked it with some print statements to show what happens. Specficially, at like 36 I have the MMFFOptimize line, and at 37 the UFFOptimize line. I've also attached a set of structures for which MMFF fails. While using UFFOptimize produces great results, I'm curious regarding why MMFFOptimize creates a problem. And, whether this is a bug which should be fixed, or just a glitch related to atom typing and other parameterizations that occur with MMFF. Thanks for any explanation or ideas. -Kirk Struct1 Cc1cc(NC(=O)CSc2ccc3nnc(CCNC(=O)c4c4)n3n2)no1 Struct2 CCOc1c1NC(=O)CSc1ccc2nnc(CCNC(=O)c3ccc(C)cc3)n2n1 from copy import deepcopy from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import Draw from rdkit.Chem import rdFMCS from rdkit.Chem import PyMol pymol=PyMol.MolViewer() #get the structures fin = open('Substruct.txt', 'r') mols = [] for l in fin: arr = l.strip().split('\t') mols.append(Chem.MolFromSmiles(arr[1])) #find the maximum common substructure mcs = rdFMCS.FindMCS( mols, completeRingsOnly=True, ringMatchesRingOnly=True ) mcs_mol = Chem.MolFromSmarts(mcs.smartsString) #check the mcs - looks reasonable z = [ AllChem.Compute2DCoords(m) for m in mols + [mcs_mol] ] img = Draw.MolsToGridImage( mols + [mcs_mol], subImgSize=(300,300), legends = ['Struct1', 'Struct2', 'MCS'] ) img.save('Substruct.png') #here's where the error occurs #before MMFF optimization, GetSubstructMatch is correct print mols[0].GetSubstructMatch(mcs_mol) #create a 3D structure for the first AllChem.EmbedMolecule(mols[0]) AllChem.MMFFOptimizeMolecule(mols[0]) #AllChem.UFFOptimizeMolecule(mols[0]) #UFF works! #after MMFF optimization, substruct match no longer correct print mols[0].GetSubstructMatch(mcs_mol) #create a template from the mcs and structure 1 mcs_match = mols[0].GetSubstructMatch(mcs_mol) template = deepcopy(mols[0]) for i,a in enumerate( template.GetAtoms() ): if (i not in mcs_match): template.GetAtomWithIdx(i).SetAtomicNum(0) template = Chem.DeleteSubstructs(template, Chem.MolFromSmarts('[#0]')) #create a 3d structure for the second constrained to the mcs mols[1] = AllChem.ConstrainedEmbed(mols[1], template) #show the results in PyMOL pymol.ShowMol(mols[0], name='Struct1') pymol.Zoom('Struct1') pymol.SetDisplayStyle('Struct1', 'sticks') pymol.ShowMol(mols[1], name='Struct2', showOnly=False) -- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PyMOL from RDKit? (Resurrection)
Paolo, Thank you! That works perfectly. I'm not sure what step I was missing before, but your script does the trick. In working with this, I found myself wanting to save some PyMOL files programatically, but I see that there is not a Save option in the RDKit PyMOL code. I added the snip below to the MolViewer class and it seems to work nicely. I don't know if it is generally useful or if it should be added to the code base - I'll let Greg make that decision. -Kirk def SaveFile(self, filename): id = self.server.save(filename) return id I've attached my modified PyMol.py file as well. On Fri, Apr 22, 2016 at 3:08 PM, Paolo Tosco <paolo.to...@unito.it> wrote: > Dear Robert, > > I have just built the latest PyMOL 1.8.2.0 on CentOS 7, I started it: > > pymol -R > > and then I ran the following Python script: > > #!/usr/bin/env python > > import os > import rdkit > from rdkit import Chem > from rdkit.Chem import PyMol > from rdkit.Chem import AllChem > > s = PyMol.MolViewer() > mol = Chem.MolFromSmiles \ > ('CCOCCn1c(C2CC[NH+](CCc3ccc(C(C)(C)C(=O)[O-])cc3)CC2)nc2c21') > mol = AllChem.AddHs(mol) > AllChem.EmbedMolecule(mol) > AllChem.MMFFOptimizeMolecule(mol) > s.ShowMol(mol, name = 'bilastine', showOnly = False) > s.Zoom('bilastine') > s.SetDisplayStyle('bilastine', 'sticks') > > I obtained the expected display: > > > > Cheers, > p. > > > On 04/22/2016 09:09 PM, Robert DeLisle wrote: > > Back again! > > I apologize for resurrecting an old topic, but I'm once again trying to > work with PyMOL through RDKit. I've been following the approach in this > thread ( > http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg00325.html) > but it seems not to work any longer. I'm using PyMOL 1.8 on Fedora and I > see that the xml-rpc file is current, so that's no longer a problem. When > I step through the process and hit this step: > > s.ShowMol(m,name='ligand',showOnly=False) > > > nothing happens in the PyMOL viewer. It just remains blank. > > Any updates on operating with PyMOL? > > -Kirk > > > > > -- > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free > trial!https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > > > > ___ > Rdkit-discuss mailing > listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > # $Id$ # # Copyright (C) 2004-2012 Greg Landrum and Rational Discovery LLC # # @@ All Rights Reserved @@ # This file is part of the RDKit. # The contents are covered by the terms of the BSD license # which is included in the file license.txt, found at the root # of the RDKit source tree. # """ uses pymol to interact with molecules """ from rdkit import Chem import os, tempfile # Python3 compatibility try: from xmlrpclib import Server except ImportError: from xmlrpc.client import Server _server=None class MolViewer(object): def __init__(self,host=None,port=9123,force=0,**kwargs): global _server if not force and _server is not None: self.server=_server else: if not host: host=os.environ.get('PYMOL_RPCHOST','localhost') _server=None serv = Server('http://%s:%d'%(host,port)) serv.ping() _server = serv self.server=serv self.InitializePyMol() def InitializePyMol(self): """ does some initializations to set up PyMol according to our tastes """ self.server.do('set valence,1') self.server.do('set stick_rad,0.15') self.server.do('set mouse_selection_mode,0') self.server.do('set line_width,2') self.server.do('set selection_width,10') self.server.do('set auto_zoom,0') def DeleteAll(self): " blows out everything in the viewer " self.server.deleteAll() def DeleteAllExcept(self,excludes): " deletes everything except the items in the provided list of arguments " allNames = self.server.getNames('*',False) for nm in allNames: if nm not in excludes: self.server.deleteObject(nm) def LoadFile(self,filename,name,showOnly=False): """ calls pymol's "load" command on the given filename; the loaded object is assigned the name "name" """ if showOnly: self.DeleteAll() id = self.server.loadFile(filename,name) return id def SaveFile(self, filename):
[Rdkit-discuss] Aligning in 3D
In working with RDKit I've been able to align 2D structures based upon a common core of MCS using AllChem.GenerateDepictionMatching2DStructure(m,p) The next step for me is to generate 3D structures and align them based upon that same common core. Obviously this leads to multiple steps, not the least of which is generating conformations that are consistent across the common core for the various molecules. I seem to recall the ability to generate a conformation and minimize it (either UFF or MMFF) and apply constraints based upon an input substructure, but I cannot find the details. Any tips to accomplish this one? -Kirk -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] PyMOL from RDKit? (Resurrection)
Back again! I apologize for resurrecting an old topic, but I'm once again trying to work with PyMOL through RDKit. I've been following the approach in this thread ( http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg00325.html) but it seems not to work any longer. I'm using PyMOL 1.8 on Fedora and I see that the xml-rpc file is current, so that's no longer a problem. When I step through the process and hit this step: s.ShowMol(m,name='ligand',showOnly=False) nothing happens in the PyMOL viewer. It just remains blank. Any updates on operating with PyMOL? -Kirk -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] cairoCanvas.py errors?
Even better! Thanks, Greg. On Mon, Apr 18, 2016, 12:37 AM Greg Landrum <greg.land...@gmail.com> wrote: > Hi Kirk, > > Welcome back! > Those were fixed for the 2015.09 release: > https://github.com/rdkit/rdkit/pull/644 > > Best, > -greg > > > On Mon, Apr 18, 2016 at 1:11 AM, Robert DeLisle <rkdeli...@gmail.com> > wrote: > >> Long time no message! >> >> Anywho, I've been working today with RDKit 2015.03.01 and in the process >> of generating a grid of molecule depictions (Draw.MolsToGridImage()), I >> received the error message below. >> >> From the last line, it seems there has been an API change that changes >> tostring() to tobytes(). I also found that fromstring() needs to change to >> frombytes(). >> >> When I made these changes and saved the results, everything works fine. >> I thought it might be useful to know given the upcoming release. >> >> -Kirk >> >> >> >> raceback (most recent call last): >> File "GenerateStructFigures.py", line 57, in >> legends = lbls) >> File >> "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/__init__.py", line >> 316, in MolsToGridImage >> **kwargs),(col*subImgSize[0],row*subImgSize[1])) >> File >> "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/__init__.py", line >> 94, in MolToImage >> img,canvas=_createCanvas(size) >> File >> "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/__init__.py", line >> 50, in _createCanvas >> canvas = Canvas(img) >> File >> "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/cairoCanvas.py", >> line 67, in __init__ >> imgd = image.tostring("raw","BGRA") >> File "/usr/lib64/python2.7/site-packages/PIL/Image.py", line 686, in >> tostring >> "Please call tobytes() instead.") >> Exception: tostring() has been removed. Please call tobytes() instead. >> >> >> >> -- >> Find and fix application performance issues faster with Applications >> Manager >> Applications Manager provides deep performance insights into multiple >> tiers of >> your business applications. It resolves application problems quickly and >> reduces your MTTR. Get your free trial! >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] cairoCanvas.py errors?
Long time no message! Anywho, I've been working today with RDKit 2015.03.01 and in the process of generating a grid of molecule depictions (Draw.MolsToGridImage()), I received the error message below. >From the last line, it seems there has been an API change that changes tostring() to tobytes(). I also found that fromstring() needs to change to frombytes(). When I made these changes and saved the results, everything works fine. I thought it might be useful to know given the upcoming release. -Kirk raceback (most recent call last): File "GenerateStructFigures.py", line 57, in legends = lbls) File "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/__init__.py", line 316, in MolsToGridImage **kwargs),(col*subImgSize[0],row*subImgSize[1])) File "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/__init__.py", line 94, in MolToImage img,canvas=_createCanvas(size) File "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/__init__.py", line 50, in _createCanvas canvas = Canvas(img) File "/storage/software/RDKit/RDKit_current/rdkit/Chem/Draw/cairoCanvas.py", line 67, in __init__ imgd = image.tostring("raw","BGRA") File "/usr/lib64/python2.7/site-packages/PIL/Image.py", line 686, in tostring "Please call tobytes() instead.") Exception: tostring() has been removed. Please call tobytes() instead. -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit from Java
RDkit-ers, I've been working with RDKit from Java for a while now and I'm spinning my wheels due to being too new to Java. I'm very comfortable with RDKit from Python, but Java is a new animal for me. I've downloaded the RDKit Java binaries and I have this: boost_system-vc100-mt-1_51.dll GraphMolWrap.dll org.RDKit.jar org.RDKitDoc.jar The two DLLs are most likely C++ libraries that are compiled into .dll so that the code can use them. I know this is true for boost and I'm guessing the GraphMolWrap.dll is similar. The two .jar files are the pieces of interest, but I cannot seem to find any documentation on RDKit from Java to get started. I can find some other examples - mostly the KNIME nodes for RDKit - that give me some clues toward function names, etc. but I'm stuck as to how to even get started. I did find this: https://code.google.com/p/rdkit/wiki/SwigExperiment At the bottom I see a Jython console session, but I'm just not able to convert this into a .java file which I can compile with javac and then actually run. Any tips on how to import libraries into a very simple chunk of Java code? Or better yet, 5-10 lines of a .java file that does something mindlessly simple would be great to help me get started. -Kirk -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit from Java
By the way, I'm looking through the Java wrapper and I'm not seeing any functions that would provide access to the 2D depiction code from Java. Does that exist and I'm just not seeing it? -Kirk On Sun, Nov 30, 2014 at 9:37 PM, Robert DeLisle rkdeli...@gmail.com wrote: Thanks, Greg! That helps a lot. I think I'm on the right track now, and if I can wrap my head around how to get all the dependencies to talk to each other appropriately (DLLs, etc.), I should be on my way. Any tips on configuration are always appreciated. (This is what I get for venturing into Java world, right?) As for the archaeology, like I tell everyone - I'm thorough. 8^D -Kirk On Sun, Nov 30, 2014 at 8:32 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Kirk, On Mon, Dec 1, 2014 at 2:14 AM, Robert DeLisle rkdeli...@gmail.com wrote: I've been working with RDKit from Java for a while now and I'm spinning my wheels due to being too new to Java. I'm very comfortable with RDKit from Python, but Java is a new animal for me. I've downloaded the RDKit Java binaries and I have this: boost_system-vc100-mt-1_51.dll GraphMolWrap.dll org.RDKit.jar org.RDKitDoc.jar The two DLLs are most likely C++ libraries that are compiled into .dll so that the code can use them. I know this is true for boost and I'm guessing the GraphMolWrap.dll is similar. Exactly. The two .jar files are the pieces of interest, but I cannot seem to find any documentation on RDKit from Java to get started. I can find some other examples - mostly the KNIME nodes for RDKit - that give me some clues toward function names, etc. but I'm stuck as to how to even get started. Yeah, the code for the knime nodes has too much knime and not enough RDKit to be useful as a place to learn. I did find this: https://code.google.com/p/rdkit/wiki/SwigExperiment At the bottom I see a Jython console session, but I'm just not able to convert this into a .java file which I can compile with javac and then actually run. Wow; that's ancient. Nice archaeology to find it. :-) Any tips on how to import libraries into a very simple chunk of Java code? Or better yet, 5-10 lines of a .java file that does something mindlessly simple would be great to help me get started. The Java (and C#) wrappers are under-documented and there's very little sample code out there. Probably your best bet is the testing code for the java wrapper: https://github.com/rdkit/rdkit/tree/master/Code/JavaWrappers/gmwrapper/src-test/org/RDKit This isn't comprehensive, but it does contain at least a starting point for most of the functionality. Best, -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit from Java
Thanks again, Greg! I can imagine 2D depiction is a tricky bit of code. I'll leave that one to the experts. All the best! Kirk On Nov 30, 2014 9:47 PM, Greg Landrum greg.land...@gmail.com wrote: On Mon, Dec 1, 2014 at 5:39 AM, Robert DeLisle rkdeli...@gmail.com wrote: By the way, I'm looking through the Java wrapper and I'm not seeing any functions that would provide access to the 2D depiction code from Java. Does that exist and I'm just not seeing it? The only thing currently available is the ToSVG() method that's on the ROMol class. This is what is used within Knime. I do really hope to have better depiction options available for the next release -- Dave Cosgrove submitted an excellent starting point earlier this year -- but that's a time-consuming bit to get right. -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit on Win7 - DLL load failed
Hi again, all! I'm trying to install RDKit on a 64-bit Windows 7 instance (in VirtualBox). I've done the following: installed Python 2.7 (32-bit) installed NumPy (for Python 2.7 32-bit) installed PIL (for Python 2.7 32-bit) environment variables are: RDBASE = c:\RDKit_2014_09_1 PYTHONPATH = %RDBASE% PATH = %PATH%;%RDBASE%\lib From a Python instance, I get this: Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. import rdkit from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File C:\RDKit_2014_09_1\rdkit\Chem\__init__.py, line 18, in module from rdkit import rdBase ImportError: DLL load failed: %1 is not a valid Win32 application. I've search the discuss archives, and found details about making sure the VC++ redistributables are present - they are. I see in RDKit\lib there are two files named *vc100*.dll, so I assume having the msvcp100.dll and msvcr100.dll are the correct versions. I've tried moving them to the RDKit\lib folder - no luck. I've also tried renaming any/all of them without the *100* version stamp - again, no luck. Running dependency walker, PYTHON27.DLL -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit on Win7 - DLL load failed
OOPS! On Sun, Nov 9, 2014 at 4:37 PM, Robert DeLisle rkdeli...@gmail.com wrote: Hi again, all! I'm trying to install RDKit on a 64-bit Windows 7 instance (in VirtualBox). I've done the following: installed Python 2.7 (32-bit) installed NumPy (for Python 2.7 32-bit) installed PIL (for Python 2.7 32-bit) environment variables are: RDBASE = c:\RDKit_2014_09_1 PYTHONPATH = %RDBASE% PATH = %PATH%;%RDBASE%\lib From a Python instance, I get this: Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. import rdkit from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File C:\RDKit_2014_09_1\rdkit\Chem\__init__.py, line 18, in module from rdkit import rdBase ImportError: DLL load failed: %1 is not a valid Win32 application. I've search the discuss archives, and found details about making sure the VC++ redistributables are present - they are. I see in RDKit\lib there are two files named *vc100*.dll, so I assume having the msvcp100.dll and msvcr100.dll are the correct versions. I've tried moving them to the RDKit\lib folder - no luck. I've also tried renaming any/all of them without the *100* version stamp - again, no luck. Running dependency walker, PYTHON27.DLL -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit on Win7 - DLL load failed
Let's try this one last time. Somehow I got two early sends of that e-mail. I apologize for the now triple post! As I was saying... I'm trying to install RDKit on a 64-bit Windows 7 instance (in VirtualBox). I've done the following: installed Python 2.7 (32-bit) installed NumPy (for Python 2.7 32-bit) installed PIL (for Python 2.7 32-bit) environment variables are: RDBASE = c:\RDKit_2014_09_1 PYTHONPATH = %RDBASE% PATH = %PATH%;%RDBASE%\lib From a Python instance, I get this: Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. import rdkit from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File C:\RDKit_2014_09_1\rdkit\Chem\__init__.py, line 18, in module from rdkit import rdBase ImportError: DLL load failed: %1 is not a valid Win32 application. I've search the discuss archives, and found details (links below) about making sure the VC++ redistributables are present - they are. I see in RDKit\lib there are two files named *vc100*.dll, so I assume having the msvcp100.dll and msvcr100.dll are the correct versions. I've tried moving them to the RDKit\lib folder - no luck. I've also tried renaming any/all of them without the *100* version stamp - again, no luck. Running dependency walker, PYTHON27.DLL is marked as not found. I do see it in C:\Windows\SysWOW64, however. It also appears that all of the marked DLLs have a 64 next to them, suggesting everything has been compiled for 64-bit. I've double checked that I am indeed using the 32-bit versions of RDKit. (I tried going to 64-bit, but I find that NumPy isn't available for 64-bit Python.) The last piece that I notice off, is within the RDKit_2014_09_1.win32.py27.zip file, the compressed directory is actually titled RDKit_2014_03_1. I assume this is just a typo, but is it the right version? Any help is greatly appreciated. -Kirk RDKit-discuss archive links: http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg02558.html http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02381.html On Sun, Nov 9, 2014 at 4:37 PM, Robert DeLisle rkdeli...@gmail.com wrote: OOPS! On Sun, Nov 9, 2014 at 4:37 PM, Robert DeLisle rkdeli...@gmail.com wrote: Hi again, all! I'm trying to install RDKit on a 64-bit Windows 7 instance (in VirtualBox). I've done the following: installed Python 2.7 (32-bit) installed NumPy (for Python 2.7 32-bit) installed PIL (for Python 2.7 32-bit) environment variables are: RDBASE = c:\RDKit_2014_09_1 PYTHONPATH = %RDBASE% PATH = %PATH%;%RDBASE%\lib From a Python instance, I get this: Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. import rdkit from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File C:\RDKit_2014_09_1\rdkit\Chem\__init__.py, line 18, in module from rdkit import rdBase ImportError: DLL load failed: %1 is not a valid Win32 application. I've search the discuss archives, and found details about making sure the VC++ redistributables are present - they are. I see in RDKit\lib there are two files named *vc100*.dll, so I assume having the msvcp100.dll and msvcr100.dll are the correct versions. I've tried moving them to the RDKit\lib folder - no luck. I've also tried renaming any/all of them without the *100* version stamp - again, no luck. Running dependency walker, PYTHON27.DLL -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit on Win7 - DLL load failed
Done and done! Downloaded, unzipped, and working! Thank you, Greg. And, no worries at all. If I had a nickel for every silly mistake I've madewell...how many stars are there in the universe? Thanks also for the links to 64-bit NumPy. I'll definitely give those a go. -Kirk On Sun, Nov 9, 2014 at 9:26 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Kirk, It looks like I made a stupid mistake when creating the win32 binaries and zipped the wrong directory. :-( I just replaced the win32 binaries on both github and sf.net with new versions that should be correct. FYI, you can get win64 python binaries for numpy and many other useful packages here: http://www.lfd.uci.edu/~gohlke/pythonlibs/ -greg On Mon, Nov 10, 2014 at 12:46 AM, Robert DeLisle rkdeli...@gmail.com wrote: Let's try this one last time. Somehow I got two early sends of that e-mail. I apologize for the now triple post! As I was saying... I'm trying to install RDKit on a 64-bit Windows 7 instance (in VirtualBox). I've done the following: installed Python 2.7 (32-bit) installed NumPy (for Python 2.7 32-bit) installed PIL (for Python 2.7 32-bit) environment variables are: RDBASE = c:\RDKit_2014_09_1 PYTHONPATH = %RDBASE% PATH = %PATH%;%RDBASE%\lib From a Python instance, I get this: Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. import rdkit from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File C:\RDKit_2014_09_1\rdkit\Chem\__init__.py, line 18, in module from rdkit import rdBase ImportError: DLL load failed: %1 is not a valid Win32 application. I've search the discuss archives, and found details (links below) about making sure the VC++ redistributables are present - they are. I see in RDKit\lib there are two files named *vc100*.dll, so I assume having the msvcp100.dll and msvcr100.dll are the correct versions. I've tried moving them to the RDKit\lib folder - no luck. I've also tried renaming any/all of them without the *100* version stamp - again, no luck. Running dependency walker, PYTHON27.DLL is marked as not found. I do see it in C:\Windows\SysWOW64, however. It also appears that all of the marked DLLs have a 64 next to them, suggesting everything has been compiled for 64-bit. I've double checked that I am indeed using the 32-bit versions of RDKit. (I tried going to 64-bit, but I find that NumPy isn't available for 64-bit Python.) The last piece that I notice off, is within the RDKit_2014_09_1.win32.py27.zip file, the compressed directory is actually titled RDKit_2014_03_1. I assume this is just a typo, but is it the right version? Any help is greatly appreciated. -Kirk RDKit-discuss archive links: http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg02558.html http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02381.html On Sun, Nov 9, 2014 at 4:37 PM, Robert DeLisle rkdeli...@gmail.com wrote: OOPS! On Sun, Nov 9, 2014 at 4:37 PM, Robert DeLisle rkdeli...@gmail.com wrote: Hi again, all! I'm trying to install RDKit on a 64-bit Windows 7 instance (in VirtualBox). I've done the following: installed Python 2.7 (32-bit) installed NumPy (for Python 2.7 32-bit) installed PIL (for Python 2.7 32-bit) environment variables are: RDBASE = c:\RDKit_2014_09_1 PYTHONPATH = %RDBASE% PATH = %PATH%;%RDBASE%\lib From a Python instance, I get this: Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. import rdkit from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File C:\RDKit_2014_09_1\rdkit\Chem\__init__.py, line 18, in module from rdkit import rdBase ImportError: DLL load failed: %1 is not a valid Win32 application. I've search the discuss archives, and found details about making sure the VC++ redistributables are present - they are. I see in RDKit\lib there are two files named *vc100*.dll, so I assume having the msvcp100.dll and msvcr100.dll are the correct versions. I've tried moving them to the RDKit\lib folder - no luck. I've also tried renaming any/all of them without the *100* version stamp - again, no luck. Running dependency walker, PYTHON27.DLL -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit from Java
Hello, all. Long time, no see. I have a project in which an application is being developed in Java and I would like to use some of the RDKit functionality to enhance it. I can easily write the Python code to do what I need, but I need to get that into a form that can be accessed from Java. The only solution I've come up with is to use something akin to py2exe which has the nice feature of not requiring the full Python and RDKit installation on the target machine, but would require some type of intermediate step (probably a file process) to pass data between Java and the .exe. Ideally, it would be nice to pass the results through interfaces, but that's being quite hopeful. I've searched through the RDKit-discuss archives for this type of thing, but I haven't seen anything that really answers my question. Also, I know there are RDKit KNIME nodes, so surely there's a direct way to this that I'm not aware of. Any suggestions or tips are greatly appreciated! -Kirk -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Editable molecule confusion
Another attempt - see Code Block 3, below. In this case, I construct the ring systems from the group up using an EditableMol. Once again it fails during sanitization, but now I think I know why. The original structure has an indole with a substituted nitrogen. During building, that nitrogen does not have a hydrogen attached, so the valence is not satisfied and sanitization fails. If I change this to an indane, it works just fine. The problem is, I cannot add hydrogens to the nitrogen until after the EditableMol is converted to a Mol, but I cannot convert it to a Mol until hydrogens are added. All of this would require some fairly sophisticated logic about the nitrogen which I'm not sure I want to include for this simple task. Code Block 3: from rdkit import Chem from rdkit.Chem import AllChem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: em = Chem.EditableMol(Chem.Mol()) indexmap = {} for a in m.GetAtoms(): if ( a.IsInRing() ): indexmap[a.GetIdx()] = em.AddAtom(Chem.Atom(a.GetAtomicNum())) for b in m.GetBonds(): if ( b.IsInRing() ): em.AddBond( indexmap[b.GetBeginAtomIdx()],indexmap[b.GetEndAtomIdx()],b.GetBondType() ) for nm in Chem.GetMolFrags(em.GetMol(), asMols=True): AllChem.Compute2DCoords(nm) sdout.write(nm) On Fri, May 31, 2013 at 2:41 PM, Robert DeLisle rkdeli...@gmail.com wrote: I am attempting to reduce a molecule (attached SDF) to just its ring systems using Code Block 1 at the bottom. The problem is that when I get through the loops removing non-ring atoms/bonds, and convert the EditableMol back to a Mol, I end up with 7 disjoint sets of atoms: ((0, 1, 2, 3, 4, 5, 6, 7, 8), (9, 10, 11, 12, 13, 14), (15,), (16,), (17, 20), (18,), (19,)) It appears that when I remove an atom from the EditableMol by index, the indices are reassigned. I tried to test this with the inelegant code in Code Block 2, which gives me the expected sets of atom indices with respect to number and size: ((0, 1, 2, 3, 4, 5, 6, 7, 8), (9, 10, 11, 12, 13, 14), (15, 16, 17, 18, 19, 20)) - but it still fails to sanitize when I convert back to a Mol. What am I missing here? Also, is there an easier (ie, existing) way to do this? I'm just looking to reduce the molecule to its ring systems and write those to an SD file. -Kirk Code block 1: from rdkit import Chem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: print len(m.GetBonds()),len(m.GetAtoms()) em = Chem.EditableMol(m) for a in m.GetAtoms(): if ( not a.IsInRing() ): em.RemoveAtom(a.GetIdx()) print a.GetIdx(), m.GetAtomWithIdx(a.GetIdx()).GetSymbol() for b in m.GetBonds(): if ( not b.IsInRing() ): a1 = b.GetBeginAtomIdx() a2 = b.GetEndAtomIdx() em.RemoveBond(a1,a2) m3 = em.GetMol() print len(m3.GetBonds()), len(m3.GetAtoms()) f = Chem.GetMolFrags(em.GetMol()) print f #for f in Chem.GetMolFrags(m3,asMols = True): #sdout.write(f) Code block 2: from rdkit import Chem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: print len(m.GetBonds()),len(m.GetAtoms()) em = Chem.EditableMol(m) active = True while ( active == True ): active = False for a in m.GetAtoms(): if ( not a.IsInRing() ): print a.GetIdx(), m.GetAtomWithIdx(a.GetIdx()).GetSymbol() em.RemoveAtom(a.GetIdx()) active = True m=em.GetMol() em = Chem.EditableMol(m) break active = True while ( active == True): active = False for b in m.GetBonds(): if ( not b.IsInRing() ): active = True a1 = b.GetBeginAtomIdx() a2 = b.GetEndAtomIdx() em.RemoveBond(a1,a2) m=em.GetMol() em = Chem.EditableMol(m) break m3 = em.GetMol() print len(m3.GetBonds()), len(m3.GetAtoms()) f = Chem.GetMolFrags(em.GetMol()) print f #for f in Chem.GetMolFrags(m3,asMols = True): #sdout.write(f) -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit_2012_09_1 build errors
Long time no email. I'm attempting to build RDKit on CentOS 5.8 and I'm getting the following error: In file included from /usr/local/include/boost/fusion/include/std_pair.hpp:10:0, from /usr/local/include/boost/math/tools/tuple.hpp:90, from /usr/local/include/boost/math/special_functions/detail/igamma_inverse.hpp:13, from /usr/local/include/boost/math/special_functions/gamma.hpp:1543, from /usr/local/include/boost/math/special_functions/detail/bessel_jy.hpp:14, from /usr/local/include/boost/math/special_functions/bessel.hpp:17, from /usr/local/include/boost/math/special_functions.hpp:18, from /usr/local/include/boost/random/generate_canonical.hpp:22, from /usr/local/include/boost/random.hpp:52, from /opt/RDKit_current/Code/RDGeneral/utils.h:17, from /opt/RDKit_current/Code/RDGeneral/utils.cpp:11: /usr/local/include/boost/fusion/adapted/std_pair.hpp:17:1: error: ‘access’ is not a class or namespace /usr/local/include/boost/fusion/adapted/std_pair.hpp:17:1: error: expected unqualified-id before ‘’ token /usr/local/include/boost/fusion/adapted/std_pair.hpp:17:1: error: ‘access’ is not a class or namespace /usr/local/include/boost/fusion/adapted/std_pair.hpp:17:1: error: expected unqualified-id before ‘’ token Clearly seems to be a boost problem, but I'm just not able to track it down. I followed these instructions: http://code.google.com/p/rdkit/wiki/BuildingOnCentOS57, and it appears boost 1.48 built OK. Any tips? -Kirk -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Fwd: Building on CentOS 5.8: Python-related tests fail
Just my $0.02. You may realize and have tried all of this already, but... I've spent a lot of time getting RDKit built on CentOS since version 5.4. The newer versions make this much easier with updated CMake, GCC, etc. One problem that I've had is trying to build while still having CentOS' standard (i.e., old) Boost libraries still installed. I know that CMake has some flags with which to set the Boost library location, but I could never get them to work and the build to see the newly built Boost library when the system standard was present. The only thing that worked for me was to remove the system boost and build my own. The make system then finds the custom build without a problem. -Kirk On Fri, Jun 22, 2012 at 9:46 AM, Leonardo Trabuco ltrab...@gmail.com wrote: Hi Greg, Thanks for following up. Below is the output you asked for. Looks like an import error in the boost library. Any ideas? Thanks again, Leo UpdateCTestConfiguration from :/net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/DartConfiguration.tcl Start processing tests UpdateCTestConfiguration from :/net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/DartConfiguration.tcl Test project /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build Constructing a list of tests Done constructing a list of tests Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/RDGeneral Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/DataStructs Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/DataStructs/Wrap 3/ 76 Testing pyBV Test command: /usr/bin/python2.6 /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/Code/DataStructs/Wrap/testBV.py Test timeout computed to be: 9.99988e+06 Traceback (most recent call last): File /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/Code/DataStructs/Wrap/testBV.py, line 1, in module from rdkit import DataStructs File /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/rdkit/DataStructs/__init__.py, line 11, in module from rdkit import rdBase ImportError: /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/boost/lib/libboost_python.so.1.49.0: undefined symbol: Py_InitModule4 -- Process completed ***Failed Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/Geometry Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/Geometry/Wrap Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/Numerics Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/Numerics/Alignment Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/Numerics/Alignment/Wrap Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/Numerics/Optimizer Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/ForceField Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/DistGeom Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/DistGeom/Wrap Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/Depictor Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/Depictor/Wrap Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/SmilesParse Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/FileParsers Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/Substruct Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/ChemReactions Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/ChemReactions/Wrap Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/ChemTransforms Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/Subgraphs Changing directory into /net/netfile2/ag-russell/install/CentOS-5.8-x86_64/RDKit_2012_03_1/build/Code/GraphMol/FragCatalog Changing directory into
[Rdkit-discuss] Giant SD file with RDKit
RDKit-sters, I'm working with a huge SD file that by all ways I measure it contains ~5,050,000 structures. (This is an eMolecules dataset.) In processing the file, I've run into an odd error. Even with the following very simple code, the file seems to be bottomless. I let it run overnight and I saw number as high as 42,000,000. Any ideas? -Kirk from rdkit import Chem sdin = Chem.SDMolSupplier for i,m in enumerate(sdin): if ( i % 10 == 0 ): print 'Structure #' + str(i) -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Giant SD file with RDKit
Eddie, Thanks for the quick response. I checked the file as you suggested and I get this: 000 2424 2424 000a 005 So it appears to end with (0x0a), correct? Getting the file to you might be a trick as it is over 4 GB compressed. My intention was to partition the file into multiple, smaller files, but this weird error occurred. -Kirk On Mon, Nov 21, 2011 at 11:42 AM, Eddie Cao eddie@me.com wrote: Hi Robert, It might help to create a small SD file consisting only of the last few structures in the SD file to make sure the error was not because the file does not end properly. Specifically, the latest RDKit release has a bug that causes it to stuck if the file does not end with line-feed character (0x0a). An easy way to check is to run `tail -1 INPUT.sdf | hexdump`. If the last character is not 0a, then you are a victim of this bug. The following example uses a bad SDF that ends with character 24: $ tail -1 test.sdf | hexdump 000 24 24 24 24 004 If you provide a link to the SD file, I can also help you check. Eddie On Nov 21, 2011, at 10:20 AM, Robert DeLisle wrote: RDKit-sters, I'm working with a huge SD file that by all ways I measure it contains ~5,050,000 structures. (This is an eMolecules dataset.) In processing the file, I've run into an odd error. Even with the following very simple code, the file seems to be bottomless. I let it run overnight and I saw number as high as 42,000,000. Any ideas? -Kirk from rdkit import Chem sdin = Chem.SDMolSupplier for i,m in enumerate(sdin): if ( i % 10 == 0 ): print 'Structure #' + str(i) -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Giant SD file with RDKit
Andrew, Good catch! I had wondered if there might be a size problem but couldn't make the connection that you made. I'll find another method to partition the file. -Kirk On Mon, Nov 21, 2011 at 12:01 PM, Andrew Dalke da...@dalkescientific.comwrote: On Nov 21, 2011, at 7:47 PM, Robert DeLisle wrote: Getting the file to you might be a trick as it is over 4 GB compressed. I think that's a clue. RDKit uses tell/seek operations on the underlying file stream, like this: ROMol *SDMolSupplier::next() { PRECONDITION(dp_inStream,no stream); // set the stream to the current position dp_inStream-seekg(d_molpos[d_last]); d_molpos contains std::streampos elements, MolSupplier.h:std::vectorstd::streampos d_molpos; // vector of positions in the file for molecules and I can't tell if that's a 32-bit or 64-bit value, but there's code which assumes it's an unsigned 32-bit integer: std::string SDMolSupplier::getItemText(unsigned int idx){ PRECONDITION(dp_inStream,no stream); unsigned int holder=d_last; moveTo(idx); unsigned int begP=d_molpos[idx]; unsigned int endP; try { My guess is that there's an overflow in this code, causing it to loop from 2**32 back to 0. Andrew da...@dalkescientific.com -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Giant SD file with RDKit
Andrew, In thinking about this, an unsigned 32-bit integer should give me over 4 billion values, and a signed 32-bit gives 2 billion. I know that the file has slightly over 5 million structures and ~300 million lines. Neither of these is over the limit, so I wouldn't expect an overflow. -Kirk On Mon, Nov 21, 2011 at 12:22 PM, Robert DeLisle rkdeli...@gmail.comwrote: Andrew, Good catch! I had wondered if there might be a size problem but couldn't make the connection that you made. I'll find another method to partition the file. -Kirk On Mon, Nov 21, 2011 at 12:01 PM, Andrew Dalke da...@dalkescientific.comwrote: On Nov 21, 2011, at 7:47 PM, Robert DeLisle wrote: Getting the file to you might be a trick as it is over 4 GB compressed. I think that's a clue. RDKit uses tell/seek operations on the underlying file stream, like this: ROMol *SDMolSupplier::next() { PRECONDITION(dp_inStream,no stream); // set the stream to the current position dp_inStream-seekg(d_molpos[d_last]); d_molpos contains std::streampos elements, MolSupplier.h:std::vectorstd::streampos d_molpos; // vector of positions in the file for molecules and I can't tell if that's a 32-bit or 64-bit value, but there's code which assumes it's an unsigned 32-bit integer: std::string SDMolSupplier::getItemText(unsigned int idx){ PRECONDITION(dp_inStream,no stream); unsigned int holder=d_last; moveTo(idx); unsigned int begP=d_molpos[idx]; unsigned int endP; try { My guess is that there's an overflow in this code, causing it to loop from 2**32 back to 0. Andrew da...@dalkescientific.com -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Giant SD file with RDKit
Andrew - thank you for the clarification. Obviously a character offset into the file makes much more sense than a line offset. oops. 8^) Greg - thanks for the link. I may give that a try. I have a different approach in place now, so this file is taken care of. I genuinely hope I don't have to process this many structures too often 8^) but I'll certainly give the ForwardSDMolSupplier a try just in case I do. On Mon, Nov 21, 2011 at 1:00 PM, Greg Landrum greg.land...@gmail.comwrote: Kirk, On Mon, Nov 21, 2011 at 8:42 PM, Robert DeLisle rkdeli...@gmail.com wrote: In thinking about this, an unsigned 32-bit integer should give me over 4 billion values, and a signed 32-bit gives 2 billion. I know that the file has slightly over 5 million structures and ~300 million lines. Neither of these is over the limit, so I wouldn't expect an overflow. The determining factor is, unfortunately, the file size, not the number of lines. If you're willing to live on the bleeding edge for a bit, there's an RDKit branch that contains a new way of working with SD files that is well suited to dealing with large files: https://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/StreambufSupport_18Nov2011 The new feature is the ForwardSDMolSupplier, this can be initialized from a filename: In [3]: suppl = Chem.ForwardSDMolSupplier('PubChemBackground.sdf') or a python file-like object: In [4]: suppl2 = Chem.ForwardSDMolSupplier(file('PubChemBackground.sdf')) You can read out molecules by looping over the supplier: In [5]: for mol in suppl2: ...: if mol is None: continue ...: print mol.GetNumAtoms() ...: 24 17 Since these work using file-like objects, you can directly read from compressed files: In [6]: suppl3 = Chem.ForwardSDMolSupplier(gzip.open('bigfile.sdf.gz')) The differences to the standard SDMolSupplier : - the ForwardSDMolSupplier is not random access; you cannot ask for a particular item - there's no reset method, if you want to go through the molecules more than once, you have to create the supplier from scratch. Coincidentally, this was inspired by some suggestions Andrew has made in the last week or so. I will be merging this branch back into the trunk sometime in the next week, but the code is there, mostly tested, and usable now. -greg -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [Rdkit-devel] Beta of Q2 2011 Release Available
It works here on CentOS 5.6. Testing with my code goes fine, but the test step (ctest from the build directory) results in 72/76 tests failed. Problem with the test DB? On Fri, Jul 1, 2011 at 5:40 AM, Greg Landrum greg.land...@gmail.com wrote: Dear all, This morning I tagged the beta for the Q2 2011 (2011.06 in the new numbering) release in svn: http://rdkit.svn.sourceforge.net/viewvc/rdkit/tags/Release_2011_06_1beta1/ and uploaded a source distribution to the google code site: http://code.google.com/p/rdkit/downloads/detail?name=RDKit_2011_06_1beta1.tgz If there's demand for it, I will also put up a windows binary. As usual: if no show-stopper bugs appear, I will do the release itself in about a week. Excerpts from the release notes are below. One highlight I will call your attention to is that, thanks to some nice work from Eddie Cao, it is now possible to generate InChI codes from within the RDKit : In [2]: inchi = Chem.MolToInchi(Chem.MolFromSmiles('c1c1C(=O)O')) In [3]: print inchi InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9) and then convert the InChIs to InChI keys: In [4]: print Chem.InchiToInchiKey(inchi) WPYMKLBDIGXBTP-UHFFFAOYSA-N There is also experimental and partial support for converting InChI back into a molecule: In [5]: m2 = Chem.MolFromInchi(inchi) In [6]: print Chem.MolToSmiles(m2) O=C(O)c1c1 Note that this last bit is not something InChI is actually designed for, so it's probably not a good idea to rely on it. Best Regards, -greg ** Release_2011.06.1 *** (Changes relative to Release_2011.03.2) Acknowledgements: - Eddie Cao, Andrew Dalke, James Davidson, JP Ebejer, Bernd Wiswedel Bug Fixes: - A problem with similarity values between SparseIntVects that contain negative values was fixed. (Issue 3295215) - An edge case in SmilesMolSupplier.GetItemText() was fixed. (Issue 3299878) - The drawing code now uses dashed lines for aromatic bonds without kekulization. (Issue 3305420) - AllChem.ConstrainedEmbed works again. (Issue 3305420) - atomic RGP values from mol files are accessible from python (Issue 3313539) - M RGP blocks are now written to mol files. (Issue 3313540) - Atom.GetSymbol() for R atoms read from mol files is now correct. (Issue 3316600) - The handling of isotope specifications is more robust. - A thread-safety problem in SmilesWrite::GetAtomSmiles() was fixed. - some of the MACCS keys definitions have been corrected New Features: - The smiles, smarts, and reaction smarts parsers all now take an additional argument, replacements, that carries out string substitutions pre-parsing. - There is now optional support for generating InChI codes and keys for molecules. - the atom pair and topological torsion fingerprint generators now take an optional ignoreAtoms argument - a function to calculate exact molecular weight was added. - new java wrappers are now available in $RDBASE/Code/JavaWrappers - the methods getMostCommonIsotope() and getMostCommonIsotopeMass() have been added to the PeriodicTable class. New Database Cartridge Features: Deprecated modules (to be removed in next release): - The original SWIG wrappers in $RDBASE/Code/Demos/SWIG are deprecated Removed modules: Other: - The quality of the drawings produced by both the python molecule drawing code and $RDBASE/Code/Demos/RDKit/Draw is better. - the python molecule drawing code will now use superscripts and subscripts appropriately when using the aggdraw or cairo canvases (cairo canvas requires pango for this to work). - $RDBASE/Code/Demos/RDKit/Draw now includes an example using cairo - A lot of compiler warnings were cleaned up. - The error reporting in the SMILES, SMARTS, and SLN parsers was improved. - the code for calculating molecular formula is now in C++ (Descriptors::calcMolFormula()) -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Rdkit-devel mailing list rdkit-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net
Re: [Rdkit-discuss] Friday evening problem... Centos RDKit -- again
I will defer to Greg's expertise for a more accurate answer, but I would suspect that the problem is the difference in using the system version of Python and a version of RDKit that is built with a newer version of GCC. You may be getting stuck in dependency confusion between the two versions. You should be able to build and install Python 2.7 without disturbing the system's Python 2.4.3. -Kirk On Fri, Jun 10, 2011 at 12:00 PM, JP jeanpaul.ebe...@inhibox.com wrote: I am installing the brand new RDKit (2011_03_2) on CentOS (lol!) on a Friday evening (6.54pm here in Oxford)... So I probably deserve the misery of the following. I have already gone through the whole RDKit on Centos installation procedure and pain on other machines and I now am undaunted by it. Bring it on. Still I installed everything (almost) according to the book ( http://code.google.com/p/rdkit/wiki/BuildingOnCentOS) with the exception that I stuck to Python 2.4.3 (Python 2.7, doesn't play nicely with Rocks) And I get this anti-fancy error message from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in ? File /share/apps/RDKit_2011_03_2/rdkit/Chem/__init__.py, line 18, in ? from rdkit import rdBase ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by /share/apps/RDKit_2011_03_2/rdkit/rdBase.so) Any ideas? [jp@xxx build]$ echo $LD_LIBRARY_PATH /share/apps/RDKit_2011_03_2/lib:/share/apps/boost_1_46_1/lib:/opt/gridengine/lib/lx26-amd64:/share/apps/openbabel/lib:/usr/local/lib:/share/apps/openbabel/lib: [jp@xxx build]$ echo $PYTHONPATH :/share/apps/RDKit_2011_03_2 [jp@xxx build]$ echo $RDBASE /share/apps/RDKit_2011_03_2 Any sympathy will be greatly appreciated. Cheers JP -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Friday evening problem... Centos RDKit -- again
Check the version of gcc fort that working build. It may not match yours. Not that it fixes your problem, unfortunately. On Jun 10, 2011 1:14 PM, JP jeanpaul.ebe...@inhibox.com wrote: It seems someone got it to work with python 2.4 on Centos (at least according to http://code.google.com/p/rdkit/wiki/WorkingBuilds). But even this is god knows how many permutations (gcc / boost / mpfr / gmp / bison / flex etc) away from mine... I'd be interested in Greg's take on supported platforms. What a start to the weekend! On 10 June 2011 20:04, Robert DeLisle rkdeli...@gmail.com wrote: I can't blame you there. One ring to bind them would be preferred. Have you searched the RDKit discussion list archives regarding Python version compatibility? I vaguely remember something about older versions of Python in general, but I don't know if it applies to this case. On Fri, Jun 10, 2011 at 1:00 PM, JP jeanpaul.ebe...@inhibox.com wrote: Hi there Kirk, Your suggestion was interesting to tinker with -- but it doesn't help my specific case. If I set the environment to work with python 2.7 (and RDKit), I break ROCKs functionality which I need from time to time. I do not want to stay switching between p2.4 and p2.7 in the same session... On 10 June 2011 19:20, Robert DeLisle rkdeli...@gmail.com wrote: I will defer to Greg's expertise for a more accurate answer, but I would suspect that the problem is the difference in using the system version of Python and a version of RDKit that is built with a newer version of GCC. You may be getting stuck in dependency confusion between the two versions. You should be able to build and install Python 2.7 without disturbing the system's Python 2.4.3. -Kirk On Fri, Jun 10, 2011 at 12:00 PM, JP jeanpaul.ebe...@inhibox.com wrote: I am installing the brand new RDKit (2011_03_2) on CentOS (lol!) on a Friday evening (6.54pm here in Oxford)... So I probably deserve the misery of the following. I have already gone through the whole RDKit on Centos installation procedure and pain on other machines and I now am undaunted by it. Bring it on. Still I installed everything (almost) according to the book ( http://code.google.com/p/rdkit/wiki/BuildingOnCentOS) with the exception that I stuck to Python 2.4.3 (Python 2.7, doesn't play nicely with Rocks) And I get this anti-fancy error message from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in ? File /share/apps/RDKit_2011_03_2/rdkit/Chem/__init__.py, line 18, in ? from rdkit import rdBase ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by /share/apps/RDKit_2011_03_2/rdkit/rdBase.so) Any ideas? [jp@xxx build]$ echo $LD_LIBRARY_PATH /share/apps/RDKit_2011_03_2/lib:/share/apps/boost_1_46_1/lib:/opt/gridengine/lib/lx26-amd64:/share/apps/openbabel/lib:/usr/local/lib:/share/apps/openbabel/lib: [jp@xxx build]$ echo $PYTHONPATH :/share/apps/RDKit_2011_03_2 [jp@xxx build]$ echo $RDBASE /share/apps/RDKit_2011_03_2 Any sympathy will be greatly appreciated. Cheers JP -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Friday evening problem... Centos RDKit -- again
Greg and JP, For my own education, could this be related to having upgraded GCC through the CentOS install instructions, but using the old Python? The new version of RDKit would have been built with the newer GCC but the old Python may not refer to the correct libraries? Or am I mixing concepts here? -Kirk On Fri, Jun 10, 2011 at 1:30 PM, Greg Landrum greg.land...@gmail.comwrote: Hi On Friday, June 10, 2011, JP jeanpaul.ebe...@inhibox.com wrote: I am installing the brand new RDKit (2011_03_2) on CentOS (lol!) on a Friday evening (6.54pm here in Oxford)... So I probably deserve the misery of the following. Nobody deserves the misery of working with Centos. ;-) I have already gone through the whole RDKit on Centos installation procedure and pain on other machines and I now am undaunted by it. Bring it on. Good attitude! Still I installed everything (almost) according to the book ( http://code.google.com/p/rdkit/wiki/BuildingOnCentOS) with the exception that I stuck to Python 2.4.3 (Python 2.7, doesn't play nicely with Rocks) And I get this anti-fancy error message from rdkit import ChemTraceback (most recent call last): File stdin, line 1, in ? File /share/apps/RDKit_2011_03_2/rdkit/Chem/__init__.py, line 18, in ?from rdkit import rdBaseImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by /share/apps/RDKit_2011_03_2/rdkit/rdBase.so) That is a glibc problem. It means that you are using something that has been built with a version of g++ that is more modern than the version of libstdc++ (I think) that is being found. You might want to google around a little bit for the error message in combination with centos and see what you can find. Believe it or not, this doesn't have much to do with the rdkit. -greg Any ideas? [jp@xxx build]$ echo $LD_LIBRARY_PATH/share/apps/RDKit_2011_03_2/lib:/share/apps/boost_1_46_1/lib:/opt/gridengine/lib/lx26-amd64:/share/apps/openbabel/lib:/usr/local/lib:/share/apps/openbabel/lib: [jp@xxx build]$ echo $PYTHONPATH :/share/apps/RDKit_2011_03_2 [jp@xxx build]$ echo $RDBASE/share/apps/RDKit_2011_03_2 Any sympathy will be greatly appreciated. CheersJP -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Fwd: Re: Re: Re: Bug in MolToFile?
Keeping everyone on the thread.. -- Forwarded message -- From: Robert DeLisle rkdeli...@gmail.com Date: Apr 19, 2011 3:42 PM Subject: Re: Re: Re: [Rdkit-discuss] Bug in MolToFile? To: Greg Landrum greg.land...@gmail.com Greg, Thank you again for the off-line assistance. Just to update the status for the others out there, the new Draw code does work in my hands. And, it is as simple as downloading just the /rdkit/Chem/Draw directory from the SourceForge svn trunk and copying it into the existing source tree. Downloading that same directory from the Google Code trunk isn't very useful - oops. -Kirk On Tue, Apr 19, 2011 at 12:43 PM, Greg Landrum greg.land...@gmail.comwrote: Hi Kirk, On Tue, Apr 19, 2011 at 6:46 PM, rkdeli...@gmail.com wrote: H I just repeated the process - I copied the most recent release to a new directory, copied in the rdkit/Chem/Draw directory from SVN, no build step this time - I get the same error: Traceback (most recent call last): File SOMtoHTML_101203.py, line 227, in module create_2D_depiction() File SOMtoHTML_101203.py, line 50, in create_2D_depiction Draw.MolToFile(m, picture_parent_folder+'/'+name+'.png', (picture_size, picture_size) ) File /opt/RDKit_2011_03_1_up1/rdkit/Chem/Draw/__init__.py, line 56, in MolToFile import cairo ImportError: No module named cairo I looked at the __init.py__ file from the SVN set and I see this: def MolToFile(mol,fileName,size=(300,300),kekulize=True, wedgeBonds=True): # original contribution from Uwe Hoffmann import cairo Line 56 is import cairo I'm really confused... Here: http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/rdkit/Chem/Draw/__init__.py?revision=1712view=markup it's different If you go to that directory and do: svn info what do you see? def MolToImageFile occurs on line 100. What have I done wrong here? Has anyone else out there tested it? I haven't heard back from anyone yet. -greg -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Installation driving me mad (RDKit on Centos 5.4 final)
JP and George - Excellent! I'm glad to hear my CentOS install walk-through is going to good use. 8^) Greg - I'm reasonably sure that when I tried to build NumPy in the absence of the linear algebra libraries, I received an error and the build failed. Unfortunately, I didn't bother to try them one at a time to verify which was needed. I'll leave that as an exercise for the reader. 8^) I edited the Wiki stating that the libraries may or may not be necessary. Regarding repos necessary, I see that all but atlas can be found in the base repo, which I assume is the CentOS default. I'm puzzled as to why blas* and atlas* aren't there. I did put in a link to the epel repository RPM, which should cure the yum install problem for all linear algebra libraries. -Kirk On Wed, Feb 23, 2011 at 6:04 AM, George Papadatos gpapada...@gmail.comwrote: Fair enough, I did not know that! However, according to the same documentation, these packages are highly recommended for NumPy and required for SciPy: http://scipy.org/Installing_SciPy/Linux#head-9cf6f4b7fe9ba63fc228203c4f28554a74970847 http://scipy.org/Installing_SciPy/Linux#head-9cf6f4b7fe9ba63fc228203c4f28554a74970847In any case, here is a repository for CentOS 5/RHEL 5 with the necessary rpms (for those who can't access yum): http://download.opensuse.org/repositories/home:/ashigabou/ http://download.opensuse.org/repositories/home:/ashigabou/After that, Kirk's walk though has been most helpful. George On 23 February 2011 11:12, Greg Landrum greg.land...@gmail.com wrote: Let me elaborate on that... from the numpy installation page (http://docs.scipy.org/doc/numpy/user/install.html: NumPy does not require any external linear algebra libraries to be installed. However, if these are available, NumPy’s setup script can detect them and use them for building. A number of different LAPACK library setups can be used, including optimized LAPACK libraries such as ATLAS, MKL or the Accelerate/vecLib framework on OS X. Best, -greg On Wed, Feb 23, 2011 at 12:10 PM, Greg Landrum greg.land...@gmail.com wrote: I'm not convinced of that. I'm pretty sure that I have built numpy on redhat and ubuntu systems without ever installing lapack. -greg On Wed, Feb 23, 2011 at 12:06 PM, George Papadatos gpapada...@gmail.com wrote: ...yet you need them to build Numpy... George On 23 February 2011 11:03, Greg Landrum greg.land...@gmail.com wrote: To be very clear: you do not need *any* of these packages to install the RDKit. -greg On Wed, Feb 23, 2011 at 10:53 AM, JP jeanpaul.ebe...@inhibox.com wrote: Great wiki - I wonder how I missed that. But the first instruction sudo yum install atlas, atlas-devel, blas blas-devel lapack lapack-devel Gives me the following error: No package atlas, available. No package atlas-devel, available. No package blas available. No package lapack available. Is there a repos I have to add to /etc/yum.repos.d/ ? On 22 February 2011 18:41, Robert DeLisle rkdeli...@gmail.com wrote: What are your environment settings? You should have at minimum, these: $RDBASE = the directory where you have installed the RDKit code $LD_LIBRARY_PATH = /usr/local/lib:/$RDBASE/lib $PYTHONPATH = $RDBASE At least this worked for me for a CentOS installation, detailed here - http://code.google.com/p/rdkit/wiki/BuildingOnCentOS Another possibility is your PATH variable. Make sure that /usr/local pathnames precede any /usr options. This will ensure looking into /usr/local first. There also may be options for cmake that will force it into the correct directory. I've found in the past that even though it says in the initial output that is looking in the correct location for boost and python, it doesn't necessarily follow its own advice. -Kirk On Tue, Feb 22, 2011 at 9:44 AM, JP jeanpaul.ebe...@inhibox.com wrote: I ended up not using yum to install Numpy - I installed it from source, which was only slightly painful. import platform; print platform.python_version() # /usr/local/lib/python2.7/platform.pyc matches /usr/local/lib/python2.7/platform.py import platform # precompiled from /usr/local/lib/python2.7/platform.pyc 2.7.0 import numpy as N a=N.random.randn(10, 10) In /usr/lib64/ I can find some libpython2.4.so , libpython2.4.so.1.0 What should I do? On 22 February 2011 16:23, rkdeli...@gmail.com wrote: Are you sure that your NumPy installation is going to the correct Python instance? I see from the logs that you have Python 2.7 installed, or at least that is what cmake is finding at /usr/local/lib. You use yum to install NumPy, but the standard installation of Python on CentOS 5.x is 2.4 and it is located in /usr/lib. Which version of Python has NumPy? -Kirk
Re: [Rdkit-discuss] RDKit on CentOs 5
was up and running. If you run into any problems, please post them so that we can (hopefully) help and others can benefit in the future. -Kirk On Jan 6, 2011 10:11am, Igor Filippov [Contr] ig...@helix.nih.gov wrote: Dear Kirk, Thank you so much! I'm in the process of compiling gcc-4.5.1 right now, having got gmp, mpc, and mpfr built with the older version of gcc. Your instructions have to be preserved for the others, I can't believe I'm the only one using CentOs/RHEL on a server/compute node. Greg, don't take it as a slam but compiling the Linux kernel is a walk in a park compared to a recent RDkit. I'm working on it second day and I'm barely half-way through the process of installing dependencies. Even without python the version of gcc which comes with CentOs 5 (4.1.2) cannot compile RDKit. On the other hand the RPM packages for Fedora have been painless to install, how nice it would be to have the RDKit RPMs for CentOs! Best, Igor On Wed, 2011-01-05 at 17:41 -0500, Robert DeLisle wrote: I have been able to reproducibly build RDKit on CentOS 5.5, but it required a significant amount of updating of the build components. The attached walk-through script should get you there. I do not recall ever seeing that particular error, however. -Kirk On Wed, Jan 5, 2011 at 1:15 PM, Igor Filippov [Contr] ig...@helix.nih.gov wrote: Dear All, Has anyone successfully compiled RDKit on CentOs 5? I'm running into the following error message: [ 15%] Building CXX object Code/Numerics/Alignment/Wrap/CMakeFiles/rdAlignment.dir/rdAlignment.cpp.o /root/RDKit_2010_09_1/Code/Numerics/Alignment/Wrap/rdAlignment.cpp:14:31: error: numpy/arrayobject.h: No such file or directory On CentOs 5 arrayobject.h is part of python-numeric package and it's located in: /usr/include/python2.4/Numeric/arrayobject.h I'm attempting to compile RDKit_2010_09_1, using boost version 1.39.0, x86_64 system. Regards, Igor -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit build on Fedora 14
Greg, After finalizing my build of RDKit on CentOS (as per my previous message thread), I decided to give it a shot on Fedora 14. I'm happy to report that this build goes incredibly smoothly without even a hiccup. The details for Fedora 14's standard install are: GCC 4.5.1-4 Boost 1.44.0 Python 2.7 NumPy 1.4.1 cmake 2.8.2 flex 2.5.35 bison 2.4.3 I then moved on to imaging and found the standard PIL (v1.1.7) installation did not have Freetype support, but the Freetype2 (v2.4.2) libraries are installed. I rebuilt PIL with the following modification to its setup.py file in order to direct it to the correct libraries and include files. change line 40 from FREETYPE_ROOT = None to FREETYPE_ROOT = usr/lib64,/usr/include I was also able to get aggdraw in place by modifying its setup.py in a similar manner, but you don't have the benefit of decomposing the library and include directories as with PIL. change line 21 from: FREETYPE_ROOT = ../../kits/freetype-2.1.10 to FREETYPE_ROOT = /usr and, change line 56 from library_dirs.append(os.path.join(FREETYPE_ROOT, lib)) to library_dirs.append(os.path.join(FREETYPE_ROOT, lib64)) That's it. Easy. 8^) -Kirk -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Building RDKit on CentOS 5
Peter - This is great! I've only browsed through the script you have, but I do see a few differences. I'll give it a shot now and report back. Thank you so much for posting this. Greg - I ran ldd on rdBase.so, and here's the output: libRDGeneral.so.1 = /opt/RDKit_svn_20101115/lib/libRDGeneral.so.1 (0x2ad72659b000) libRDBoost.so.1 = /opt/RDKit_svn_20101115/lib/libRDBoost.so.1 (0x2ad7267d6000) libboost_python.so.1.44.0 = /usr/local/lib/libboost_python.so.1.44.0 (0x2ad726ba5000) libstdc++.so.6 = /usr/local/lib64/libstdc++.so.6 (0x2ad726df7000) libm.so.6 = /lib64/libm.so.6 (0x2ad72712f000) libgcc_s.so.1 = /usr/local/lib64/libgcc_s.so.1 (0x2ad7273b2000) libc.so.6 = /lib64/libc.so.6 (0x2ad7275c9000) libutil.so.1 = /lib64/libutil.so.1 (0x2ad72792) libpthread.so.0 = /lib64/libpthread.so.0 (0x2ad727b23000) libdl.so.2 = /lib64/libdl.so.2 (0x2ad727d3f000) librt.so.1 = /lib64/librt.so.1 (0x2ad727f43000) /lib64/ld-linux-x86-64.so.2 (0x0031f700) It does look like it is refering to the correct instances of what I've built. There are a few system level C/C++ library references, but I'm not seeing anything odd here. What's your take on it? -Kirk On Wed, Nov 17, 2010 at 6:34 AM, Peter Schmidtke pschmid...@ub.edu wrote: Hey Greg, yep that would be great, as right now they are only on a group internal blog ;) I saw that you recently changed you linux build instructions (concerning database things, boost numerical bindings etc...), but I did this before this came out ;) First lets see if Robert comes through the install process without major problems and then you can post it on your wiki (I might have forgotten some stuff). Some things are based on installing pycuda on those machines, this is why signals and things like that are compiled with boost (might be worth to mention somehow in case people need both). ++ Peter On 17/11/2010, at 14:25, Greg Landrum wrote: Dear Peter, Thanks for posting these very detailed instructions. Do you mind if I post them to the wiki (with credit of course) to make them easier to find? I made a few comments and suggestions below: On Wed, Nov 17, 2010 at 11:19 AM, Peter Schmidtke pschmid...@ub.edu wrote: Dear Robert, I recently ran also into several problems while installing rdkit on a fresh Centos 5.3. It's a real headache. Anyway, this time I've written up a guide of how to do it step by step, I hope I didn't forget anything in the end. However, now it works just fine on our Centos machines. Here's the step by step installing guide : Centos is a stable but not very userfriendly OS. This becomes obvious when one wants to install python packages like pycuda etc...Centos comes with a very old python version, 2.4, but lots of newer features, like pycuda require a newer python version. Lets start the lengthy install process under Centos : Installing Python 2.6 or newer If you already have python2.7 installed, please check that it was installed with --enabled-shared. If this is the case you should have libpython2.7.so in /usr/local/lib. If not, you should have libpython2.7.a. If the second is the case, you have to install python2.7 with the following way : Download the current version from python (source code). Like with 2.6 or 2.7 (don't grab the 3.x for now) : wget http://www.python.org/ftp/python/2.7/Python-2.7.tgz Next untar and unzip the file, go to Python-2.7 directory and issue : ./configure --enable-shared; make; sudo make install This installs python in the /usr/local/ directory. Add the RPMForge repo to yum : wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm su -c 'rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm' Then install atlas, lapack, blas : yum install atlas-c++.x86_64 atlas-c++-devel.x86_64 lapack.x86_64 lapack-devel.x86_64 blas.x86_64 blas-devel.x86_64 Now we can install fftw3 : yum install fftw3.x86_64 fftw3-devel.x86_64 Now we could potentially install numpy 1.3 or 1.4, but as python2.7 is brand new there are some problems. I downloaded : wget http://sourceforge.net/projects/numpy/files/NumPy/1.3.0/numpy-1.3.0.tar.gz/download then untar and unzip this whole thing and go to the numpy directory Download the following patch : wget http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/dev-python/numpy/files/numpy-1.4.0-python-2.7.patch and apply it in this directory using : patch -p0 numpy-1.4.0-python-2.7.patch Now build numpy using python setup.py build; python setup.py install Numpy should now be accessible from python2.7, simply try a import numpy after launching python to check. First we need to install the boost libraries and their python bindings. Download boost to your downloads
Re: [Rdkit-discuss] Building RDKit on CentOS 5
: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01107.html In /agg2/include/agg_array.h change line #523 from this: unsigned align = (alignment - unsigned(ptr) % alignment) % alignment; to this: unsigned align = (alignment - (unsigned long)(ptr) % alignment) % alignment; I can then get aggdraw to build, but running the selftest.py gives a segmentation fault. If I go ahead and install, it seems to work just fine and the images produced from RDKit are much improved. The build process is: export CFLAGS=-fpermissive python setup.py build_ext -i python setup.py install On Wed, Nov 17, 2010 at 9:51 AM, Robert DeLisle rkdeli...@gmail.com wrote: Peter - This is great! I've only browsed through the script you have, but I do see a few differences. I'll give it a shot now and report back. Thank you so much for posting this. Greg - I ran ldd on rdBase.so, and here's the output: libRDGeneral.so.1 = /opt/RDKit_svn_20101115/lib/libRDGeneral.so.1 (0x2ad72659b000) libRDBoost.so.1 = /opt/RDKit_svn_20101115/lib/libRDBoost.so.1 (0x2ad7267d6000) libboost_python.so.1.44.0 = /usr/local/lib/libboost_python.so.1.44.0 (0x2ad726ba5000) libstdc++.so.6 = /usr/local/lib64/libstdc++.so.6 (0x2ad726df7000) libm.so.6 = /lib64/libm.so.6 (0x2ad72712f000) libgcc_s.so.1 = /usr/local/lib64/libgcc_s.so.1 (0x2ad7273b2000) libc.so.6 = /lib64/libc.so.6 (0x2ad7275c9000) libutil.so.1 = /lib64/libutil.so.1 (0x2ad72792) libpthread.so.0 = /lib64/libpthread.so.0 (0x2ad727b23000) libdl.so.2 = /lib64/libdl.so.2 (0x2ad727d3f000) librt.so.1 = /lib64/librt.so.1 (0x2ad727f43000) /lib64/ld-linux-x86-64.so.2 (0x0031f700) It does look like it is refering to the correct instances of what I've built. There are a few system level C/C++ library references, but I'm not seeing anything odd here. What's your take on it? -Kirk On Wed, Nov 17, 2010 at 6:34 AM, Peter Schmidtke pschmid...@ub.eduwrote: Hey Greg, yep that would be great, as right now they are only on a group internal blog ;) I saw that you recently changed you linux build instructions (concerning database things, boost numerical bindings etc...), but I did this before this came out ;) First lets see if Robert comes through the install process without major problems and then you can post it on your wiki (I might have forgotten some stuff). Some things are based on installing pycuda on those machines, this is why signals and things like that are compiled with boost (might be worth to mention somehow in case people need both). ++ Peter On 17/11/2010, at 14:25, Greg Landrum wrote: Dear Peter, Thanks for posting these very detailed instructions. Do you mind if I post them to the wiki (with credit of course) to make them easier to find? I made a few comments and suggestions below: On Wed, Nov 17, 2010 at 11:19 AM, Peter Schmidtke pschmid...@ub.edu wrote: Dear Robert, I recently ran also into several problems while installing rdkit on a fresh Centos 5.3. It's a real headache. Anyway, this time I've written up a guide of how to do it step by step, I hope I didn't forget anything in the end. However, now it works just fine on our Centos machines. Here's the step by step installing guide : Centos is a stable but not very userfriendly OS. This becomes obvious when one wants to install python packages like pycuda etc...Centos comes with a very old python version, 2.4, but lots of newer features, like pycuda require a newer python version. Lets start the lengthy install process under Centos : Installing Python 2.6 or newer If you already have python2.7 installed, please check that it was installed with --enabled-shared. If this is the case you should have libpython2.7.so in /usr/local/lib. If not, you should have libpython2.7.a. If the second is the case, you have to install python2.7 with the following way : Download the current version from python (source code). Like with 2.6 or 2.7 (don't grab the 3.x for now) : wget http://www.python.org/ftp/python/2.7/Python-2.7.tgz Next untar and unzip the file, go to Python-2.7 directory and issue : ./configure --enable-shared; make; sudo make install This installs python in the /usr/local/ directory. Add the RPMForge repo to yum : wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm su -c 'rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm' Then install atlas, lapack, blas : yum install atlas-c++.x86_64 atlas-c++-devel.x86_64 lapack.x86_64 lapack-devel.x86_64 blas.x86_64 blas-devel.x86_64 Now we can install fftw3 : yum install fftw3.x86_64 fftw3-devel.x86_64 Now we could potentially install numpy 1.3 or 1.4, but as python2.7 is brand new there are some
Re: [Rdkit-discuss] Building RDKit on CentOS 5
I'm sorry for the slow response. Busy day. For this install, I started on a CentOS 5.5 system that was up to date with all package upgrades. Following is what I've done so far: installed blas, blas-devel, lapack, lapack-devel through yum I had problems in the past with the standard GCC package on CentOS which is version 4.1.2, so I rebuilt the GCC 4.4.5 package and included mpfr 2.4.1 Installed cmake 2.8.2 Installed flex 2.5.35 CentOS's Python installation is v2.4.1, so I built and installed 2.7. Due to errors found later in the process, I built this with the -fPIC switch and also enabled Unicode UCS4 support ./configure CFLAGS=-fPIC --enable-unicode=ucs4 Built and installed NumPy 1.5.0 Boost on CentOS 5.5 is v1.33, so I built and installed boost 1.44 with the following commands: ./bootstrap.sh --with-libraries=python,regex ./bjam address-model=64 install Finally, with RDKit I have $LD_LIBRARY_PATH On Mon, Nov 15, 2010 at 10:11 PM, rkdeli...@gmail.com wrote: No, I made sure to include the address-model=64 switch to bjam. Tomorrow when I get in I'll update the thread with all the steps I've followed. -Kirk On Nov 15, 2010 9:52pm, Greg Landrum greg.land...@gmail.com wrote: Kirk, On Tue, Nov 16, 2010 at 12:38 AM, Robert DeLisle rkdeli...@gmail.com wrote: Yes, that is also true. The error in my most recent messages stems from the default build of Python supporst Unicode UCS2, but apparently boost expects UCS4. A rebuild of Python with UCS4 enabled fixed that problem. Now I get a similar error related to Py_InitModule4 not being defined. From what I can find, this is a 32-bit - 64-bit problem in which this was defined as Py_InitModule4_64 in the 64-bit Python libraries but that change may not have cascaded to all necessary parts of the build process. Most of the changes involve some substantial changes to the accessing code, but I'm still looking for a better option. Could it be that the boost libraries you are using were not built in 64bit mode? I've managed to force a 64bit build in the past with the following command line: ./bjam address-model=64 cflags=-fPIC cxxflags=-fPIC install Best Regards, -greg -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Building RDKit on CentOS 5
-- Forwarded message -- From: Robert DeLisle rkdeli...@gmail.com Date: Tue, Nov 16, 2010 at 3:03 PM Subject: Re: Re: [Rdkit-discuss] Building RDKit on CentOS 5 To: Greg Landrum greg.land...@gmail.com, Robert DeLisle rkdeli...@gmail.com 'm sorry for the slow response. Busy day. For this install, I started on a CentOS 5.5 system that was up to date with all package upgrades. Following is what I've done so far: installed blas, blas-devel, lapack, lapack-devel through yum I had problems in the past with the standard GCC package on CentOS which is version 4.1.2, so I rebuilt the GCC 4.4.5 package and included mpfr 2.4.1 Installed cmake 2.8.2 Installed flex 2.5.35 CentOS's Python installation is v2.4.1, so I built and installed 2.7. Due to errors found later in the process, I built this with the -fPIC switch and also enabled Unicode UCS4 support ./configure CFLAGS=-fPIC --enable-unicode=ucs4 Built and installed NumPy 1.5.0 Boost on CentOS 5.5 is v1.33, so I built and installed boost 1.44 with the following commands: ./bootstrap.sh --with-libraries=python,regex ./bjam address-model=64 install Finally, with RDKit I have $LD_LIBRARY_PATH set with /usr/local/lib first to avoid conflicts with the system packages. GCC and Python are both in /usr/local and these are the instances referred to by my user and root. For RDKit, the following commands were done: cmake -DBoost_USE_STATIC_LIBS=OFF -DBOOST_ROOT=/usr/local .. make make install I have also installed FreeType2 and PIL - both seem fine with Python 2.7. I attempted aggdraw, but the self-test seem to always give me a Segmentation Fault. I found that I can build aggdraw using the code as-is as long as I include CFLAGS=-fpermissive, or there is a one line code change that makes the compiler happy on 64-bit. Either way I still get the seg fault upon testing. Regarding RDKit, the first group of errors I received consisted of that requiring Python be built with -fPIC and what seems to be the typical USE_STATIC_LIBS error. Initially, an -fPIC error would occur around 87% which was not cured by the Python rebuild or any other modification. I found that by switching to the SVN code, the problem was solved. Upon inspecting the errors logs, it appeared that the build process was always referring to the system Boost install and not my new install despite having set -DBOOST_ROOT correctly. Currently, the build goes to completion but upon issuing 'from rdkit import Chem' wihtin Python 2.7, I get an error related to Py_InitModule4 not being defined. From a little Google searching for Py_InitModule4 the only thing I've seen thus far is a conflict in various packages on code built on 32-bit or 64-bit systems. It seems that this name has been renamed to Py_InitModule4_64 on 64-bit systems but that change may not be reflected in all code necessary. It seemed a widespread problem and not specific to any one application or library, which makes me think it is something in a Python include file. I appreciate any help that anyone can provide. Please let me know if I need to clarify or add any details. -Kirk On Mon, Nov 15, 2010 at 10:11 PM, rkdeli...@gmail.com wrote: No, I made sure to include the address-model=64 switch to bjam. Tomorrow when I get in I'll update the thread with all the steps I've followed. -Kirk On Nov 15, 2010 9:52pm, Greg Landrum greg.land...@gmail.com wrote: Kirk, On Tue, Nov 16, 2010 at 12:38 AM, Robert DeLisle rkdeli...@gmail.com wrote: Yes, that is also true. The error in my most recent messages stems from the default build of Python supporst Unicode UCS2, but apparently boost expects UCS4. A rebuild of Python with UCS4 enabled fixed that problem. Now I get a similar error related to Py_InitModule4 not being defined. From what I can find, this is a 32-bit - 64-bit problem in which this was defined as Py_InitModule4_64 in the 64-bit Python libraries but that change may not have cascaded to all necessary parts of the build process. Most of the changes involve some substantial changes to the accessing code, but I'm still looking for a better option. Could it be that the boost libraries you are using were not built in 64bit mode? I've managed to force a 64bit build in the past with the following command line: ./bjam address-model=64 cflags=-fPIC cxxflags=-fPIC install Best Regards, -greg -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net
Re: [Rdkit-discuss] Building RDKit on CentOS 5
I apologize for that doulbe post before - itchy send finger. Here's the specific error I'm getting after the build process has otherwise succeded. from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File /opt/RDKit_svn_20101115/rdkit/Chem/__init__.py, line 18, in module from rdkit import rdBase ImportError: /usr/lib64/libboost_python.so.2: undefined symbol: Py_InitModule4 On Tue, Nov 16, 2010 at 3:04 PM, Robert DeLisle rkdeli...@gmail.com wrote: -- Forwarded message -- From: Robert DeLisle rkdeli...@gmail.com Date: Tue, Nov 16, 2010 at 3:03 PM Subject: Re: Re: [Rdkit-discuss] Building RDKit on CentOS 5 To: Greg Landrum greg.land...@gmail.com, Robert DeLisle rkdeli...@gmail.com 'm sorry for the slow response. Busy day. For this install, I started on a CentOS 5.5 system that was up to date with all package upgrades. Following is what I've done so far: installed blas, blas-devel, lapack, lapack-devel through yum I had problems in the past with the standard GCC package on CentOS which is version 4.1.2, so I rebuilt the GCC 4.4.5 package and included mpfr 2.4.1 Installed cmake 2.8.2 Installed flex 2.5.35 CentOS's Python installation is v2.4.1, so I built and installed 2.7. Due to errors found later in the process, I built this with the -fPIC switch and also enabled Unicode UCS4 support ./configure CFLAGS=-fPIC --enable-unicode=ucs4 Built and installed NumPy 1.5.0 Boost on CentOS 5.5 is v1.33, so I built and installed boost 1.44 with the following commands: ./bootstrap.sh --with-libraries=python,regex ./bjam address-model=64 install Finally, with RDKit I have $LD_LIBRARY_PATH set with /usr/local/lib first to avoid conflicts with the system packages. GCC and Python are both in /usr/local and these are the instances referred to by my user and root. For RDKit, the following commands were done: cmake -DBoost_USE_STATIC_LIBS=OFF -DBOOST_ROOT=/usr/local .. make make install I have also installed FreeType2 and PIL - both seem fine with Python 2.7. I attempted aggdraw, but the self-test seem to always give me a Segmentation Fault. I found that I can build aggdraw using the code as-is as long as I include CFLAGS=-fpermissive, or there is a one line code change that makes the compiler happy on 64-bit. Either way I still get the seg fault upon testing. Regarding RDKit, the first group of errors I received consisted of that requiring Python be built with -fPIC and what seems to be the typical USE_STATIC_LIBS error. Initially, an -fPIC error would occur around 87% which was not cured by the Python rebuild or any other modification. I found that by switching to the SVN code, the problem was solved. Upon inspecting the errors logs, it appeared that the build process was always referring to the system Boost install and not my new install despite having set -DBOOST_ROOT correctly. Currently, the build goes to completion but upon issuing 'from rdkit import Chem' wihtin Python 2.7, I get an error related to Py_InitModule4 not being defined. From a little Google searching for Py_InitModule4 the only thing I've seen thus far is a conflict in various packages on code built on 32-bit or 64-bit systems. It seems that this name has been renamed to Py_InitModule4_64 on 64-bit systems but that change may not be reflected in all code necessary. It seemed a widespread problem and not specific to any one application or library, which makes me think it is something in a Python include file. I appreciate any help that anyone can provide. Please let me know if I need to clarify or add any details. -Kirk On Mon, Nov 15, 2010 at 10:11 PM, rkdeli...@gmail.com wrote: No, I made sure to include the address-model=64 switch to bjam. Tomorrow when I get in I'll update the thread with all the steps I've followed. -Kirk On Nov 15, 2010 9:52pm, Greg Landrum greg.land...@gmail.com wrote: Kirk, On Tue, Nov 16, 2010 at 12:38 AM, Robert DeLisle rkdeli...@gmail.com wrote: Yes, that is also true. The error in my most recent messages stems from the default build of Python supporst Unicode UCS2, but apparently boost expects UCS4. A rebuild of Python with UCS4 enabled fixed that problem. Now I get a similar error related to Py_InitModule4 not being defined. From what I can find, this is a 32-bit - 64-bit problem in which this was defined as Py_InitModule4_64 in the 64-bit Python libraries but that change may not have cascaded to all necessary parts of the build process. Most of the changes involve some substantial changes to the accessing code, but I'm still looking for a better option. Could it be that the boost libraries you are using were not built in 64bit mode? I've managed to force a 64bit build in the past with the following command line: ./bjam address-model=64 cflags=-fPIC
[Rdkit-discuss] Building RDKit on CentOS 5
I've been working to build RDKit on Centos 5, and I'm hitting a very common error. Unfortunately, none of the standard fixes have helped. Details: The error that I'm seeing is this: [ 82%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNParse.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNAttribs.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/sln.tab.cpp.o [ 84%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/lex.yysln.cpp.o Linking CXX shared library libSLNParse.so /usr/bin/ld: /usr/lib/../lib64/libboost_regex.a(instances.o): relocation R_X86_64_32 against `boost::object_cacheboost::re_detail::cpp_regex_traits_basechar, boost::re_detail::cpp_regex_traits_implementationchar ::do_get(boost::re_detail::cpp_regex_traits_basechar const, unsigned long)::s_data' can not be used when making a shared object; recompile with -fPIC /usr/lib/../lib64/libboost_regex.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[2]: *** [Code/GraphMol/SLNParse/libSLNParse.so] Error 1 make[1]: *** [Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/all] Error 2 make: *** [all] Error 2 I've taken the standard steps of building Python (v2.7) with the -fPIC flag. Specficially, I attached CFLAGS=-fPIC to configure in the Python build. This solved the first instance of this type of error occuring at about 3%. I've also tried the two fixes for Boost with the following command line to build RDKit: cmake -DBOOST_ROOT=/usr/local -DBoost_USE_STATIC_LIBS=OFF .. I still get this error, and I notice that the Boost libraries that are being referred to are actually the system installation in usr/lib64 and not those that I've build in /usr/local/lib. It would seem that I can't seem to force make to look in the right location. Any tips are greatly apprciated. -Kirk -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Building RDKit on CentOS 5
Yep, I've defintely done that. I've even gone so far as to wipe out the directory entirely and start with a fresh RDKit directory. I also looked into the cache file and seen that the library directories appear to be set as /usr/local/lib and /user/local/lib64, but one the error occurs, it refers to /usr/lib64. I can't seem to find any reason for this. -Kirk On Mon, Nov 15, 2010 at 3:26 PM, Eddie Cao cao.yi...@gmail.com wrote: Have you tried to remove the CMake cache file before rerun cmake? rm -f CMakeCache.txt After rerun cmake, take a look at that file again and make sure things like Boost_INCLUDE_DIR and Boost_LIBRARY_DIRS all point to /usr/local/include and /usr/local/lib, etc. Eddie On Nov 15, 2010, at 12:45 PM, Robert DeLisle wrote: I've been working to build RDKit on Centos 5, and I'm hitting a very common error. Unfortunately, none of the standard fixes have helped. Details: The error that I'm seeing is this: [ 82%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNParse.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNAttribs.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/sln.tab.cpp.o [ 84%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/lex.yysln.cpp.o Linking CXX shared library libSLNParse.so /usr/bin/ld: /usr/lib/../lib64/libboost_regex.a(instances.o): relocation R_X86_64_32 against `boost::object_cacheboost::re_detail::cpp_regex_traits_basechar, boost::re_detail::cpp_regex_traits_implementationchar ::do_get(boost::re_detail::cpp_regex_traits_basechar const, unsigned long)::s_data' can not be used when making a shared object; recompile with -fPIC /usr/lib/../lib64/libboost_regex.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[2]: *** [Code/GraphMol/SLNParse/libSLNParse.so] Error 1 make[1]: *** [Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/all] Error 2 make: *** [all] Error 2 I've taken the standard steps of building Python (v2.7) with the -fPIC flag. Specficially, I attached CFLAGS=-fPIC to configure in the Python build. This solved the first instance of this type of error occuring at about 3%. I've also tried the two fixes for Boost with the following command line to build RDKit: cmake -DBOOST_ROOT=/usr/local -DBoost_USE_STATIC_LIBS=OFF .. I still get this error, and I notice that the Boost libraries that are being referred to are actually the system installation in usr/lib64 and not those that I've build in /usr/local/lib. It would seem that I can't seem to force make to look in the right location. Any tips are greatly apprciated. -Kirk -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Building RDKit on CentOS 5
It must be something in the release version of RDKit. I just grabbed the SVN version, put it in the same location, followed the same procedures, and it has just compiled fine without any other changes on my part. Greg - any ideas what the difference is here? Not that it matters given that the SVN is working, but just for curiosity's sake. Sadly, now I get this from with Python: from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File /opt/RDKit_svn_20101115/rdkit/Chem/__init__.py, line 18, in module from rdkit import rdBase ImportError: /usr/lib64/libboost_python.so.2: undefined symbol: PyUnicodeUCS4_FromEncodedObject -Kirk On Mon, Nov 15, 2010 at 3:28 PM, Robert DeLisle rkdeli...@gmail.com wrote: Yep, I've defintely done that. I've even gone so far as to wipe out the directory entirely and start with a fresh RDKit directory. I also looked into the cache file and seen that the library directories appear to be set as /usr/local/lib and /user/local/lib64, but one the error occurs, it refers to /usr/lib64. I can't seem to find any reason for this. -Kirk On Mon, Nov 15, 2010 at 3:26 PM, Eddie Cao cao.yi...@gmail.com wrote: Have you tried to remove the CMake cache file before rerun cmake? rm -f CMakeCache.txt After rerun cmake, take a look at that file again and make sure things like Boost_INCLUDE_DIR and Boost_LIBRARY_DIRS all point to /usr/local/include and /usr/local/lib, etc. Eddie On Nov 15, 2010, at 12:45 PM, Robert DeLisle wrote: I've been working to build RDKit on Centos 5, and I'm hitting a very common error. Unfortunately, none of the standard fixes have helped. Details: The error that I'm seeing is this: [ 82%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNParse.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNAttribs.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/sln.tab.cpp.o [ 84%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/lex.yysln.cpp.o Linking CXX shared library libSLNParse.so /usr/bin/ld: /usr/lib/../lib64/libboost_regex.a(instances.o): relocation R_X86_64_32 against `boost::object_cacheboost::re_detail::cpp_regex_traits_basechar, boost::re_detail::cpp_regex_traits_implementationchar ::do_get(boost::re_detail::cpp_regex_traits_basechar const, unsigned long)::s_data' can not be used when making a shared object; recompile with -fPIC /usr/lib/../lib64/libboost_regex.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[2]: *** [Code/GraphMol/SLNParse/libSLNParse.so] Error 1 make[1]: *** [Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/all] Error 2 make: *** [all] Error 2 I've taken the standard steps of building Python (v2.7) with the -fPIC flag. Specficially, I attached CFLAGS=-fPIC to configure in the Python build. This solved the first instance of this type of error occuring at about 3%. I've also tried the two fixes for Boost with the following command line to build RDKit: cmake -DBOOST_ROOT=/usr/local -DBoost_USE_STATIC_LIBS=OFF .. I still get this error, and I notice that the Boost libraries that are being referred to are actually the system installation in usr/lib64 and not those that I've build in /usr/local/lib. It would seem that I can't seem to force make to look in the right location. Any tips are greatly apprciated. -Kirk -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Building RDKit on CentOS 5
Yes, that is also true. The error in my most recent messages stems from the default build of Python supporst Unicode UCS2, but apparently boost expects UCS4. A rebuild of Python with UCS4 enabled fixed that problem. Now I get a similar error related to Py_InitModule4 not being defined. From what I can find, this is a 32-bit - 64-bit problem in which this was defined as Py_InitModule4_64 in the 64-bit Python libraries but that change may not have cascaded to all necessary parts of the build process. Most of the changes involve some substantial changes to the accessing code, but I'm still looking for a better option. On Mon, Nov 15, 2010 at 4:14 PM, Eddie Cao cao.yi...@gmail.com wrote: Make sure /usr/local/lib appears before /usr/lib64 in your LD_LIBRARY_PATH. It seems python import loads the system boost rather than your custom boost. -Eddie On Nov 15, 2010, at 2:44 PM, Robert DeLisle wrote: It must be something in the release version of RDKit. I just grabbed the SVN version, put it in the same location, followed the same procedures, and it has just compiled fine without any other changes on my part. Greg - any ideas what the difference is here? Not that it matters given that the SVN is working, but just for curiosity's sake. Sadly, now I get this from with Python: from rdkit import Chem Traceback (most recent call last): File stdin, line 1, in module File /opt/RDKit_svn_20101115/rdkit/Chem/__init__.py, line 18, in module from rdkit import rdBase ImportError: /usr/lib64/libboost_python.so.2: undefined symbol: PyUnicodeUCS4_FromEncodedObject -Kirk On Mon, Nov 15, 2010 at 3:28 PM, Robert DeLisle rkdeli...@gmail.comwrote: Yep, I've defintely done that. I've even gone so far as to wipe out the directory entirely and start with a fresh RDKit directory. I also looked into the cache file and seen that the library directories appear to be set as /usr/local/lib and /user/local/lib64, but one the error occurs, it refers to /usr/lib64. I can't seem to find any reason for this. -Kirk On Mon, Nov 15, 2010 at 3:26 PM, Eddie Cao cao.yi...@gmail.com wrote: Have you tried to remove the CMake cache file before rerun cmake? rm -f CMakeCache.txt After rerun cmake, take a look at that file again and make sure things like Boost_INCLUDE_DIR and Boost_LIBRARY_DIRS all point to /usr/local/include and /usr/local/lib, etc. Eddie On Nov 15, 2010, at 12:45 PM, Robert DeLisle wrote: I've been working to build RDKit on Centos 5, and I'm hitting a very common error. Unfortunately, none of the standard fixes have helped. Details: The error that I'm seeing is this: [ 82%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNParse.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/SLNAttribs.cpp.o [ 83%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/sln.tab.cpp.o [ 84%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/lex.yysln.cpp.o Linking CXX shared library libSLNParse.so /usr/bin/ld: /usr/lib/../lib64/libboost_regex.a(instances.o): relocation R_X86_64_32 against `boost::object_cacheboost::re_detail::cpp_regex_traits_basechar, boost::re_detail::cpp_regex_traits_implementationchar ::do_get(boost::re_detail::cpp_regex_traits_basechar const, unsigned long)::s_data' can not be used when making a shared object; recompile with -fPIC /usr/lib/../lib64/libboost_regex.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[2]: *** [Code/GraphMol/SLNParse/libSLNParse.so] Error 1 make[1]: *** [Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/all] Error 2 make: *** [all] Error 2 I've taken the standard steps of building Python (v2.7) with the -fPIC flag. Specficially, I attached CFLAGS=-fPIC to configure in the Python build. This solved the first instance of this type of error occuring at about 3%. I've also tried the two fixes for Boost with the following command line to build RDKit: cmake -DBOOST_ROOT=/usr/local -DBoost_USE_STATIC_LIBS=OFF .. I still get this error, and I notice that the Boost libraries that are being referred to are actually the system installation in usr/lib64 and not those that I've build in /usr/local/lib. It would seem that I can't seem to force make to look in the right location. Any tips are greatly apprciated. -Kirk -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists
Re: [Rdkit-discuss] How to trap the exceptions in RDKit?
The easiest trap is simply this: if (m is None): #error handling code The problem that I have had is that this will effectively skip bad molecules, but in a large SD file, it is difficult to find out which molecules they were. sd = Chem.SDMolSupplier(test.sdf) for m in sd: if m is None: #how do I get more information about the broken molecules? else: #do the normal stuff. On Fri, Oct 22, 2010 at 8:51 AM, nikolaus.sti...@novartis.com wrote: Hi, your molecule itself is not ok - the Sn has a larger valence then permitted. There was a similar case recently about phosphor. I would guess that you should fix your molecule first. In your code example below your molecule m is None and hence the rest will not work. Hope that helps Cheers Nik *sridhar.kuntamukk...@thomsonreuters.com* 10/22/2010 04:41 PM To rdkit-discuss@lists.sourceforge.net cc Subject [Rdkit-discuss] How to trap the exceptions in RDKit? Hi, I have the following code which raises an exception because the molecule is not up to its expectations. But I can’t find a way to trap the exception. Can someone suggest one, please? from rdkit import Chem from rdkit.Chem import AvailDescriptors from rdkit.Chem import Crippen m=Chem.MolFromSmiles('c1ccc2c(c1)/C=N/c3c3S[Ti]O2.[CH]1[CH][CH][CH][CH]1.Cl[Sn-](Cl)(Cl)(Cl)Cl') # I tried this way molog = Crippen.MolLogP(m) print molog # and also this way first if AvailDescriptors.descDict['MolLogP'](m): mollogp = AvailDescriptors.descDict['MolLogP'](m) I also wanted to calc. NumHDonors and NumHAcceptors. But if it failed on one descriptor, will it fail on other descriptors as well? Any suggestions? Thanks Sridhar p.s. to Eddie. Turned out my server has the 11g client and the installation on the server works fine. I guess I must have missed a line or two of instructions that the client must be 11g’s and not the DB itself. My PC had oracle 9 and 10 clients. *From:* Eddie Cao [mailto:cao.yi...@gmail.com] * Sent:* Wednesday, October 20, 2010 4:26 PM* To:* Kuntamukkula, Sridhar (HlthcrScience)* Cc:* rdkit-disc...@lists.sourceforge.net* Subject:* Re: [Rdkit-discuss] [Rdkit-devel] How to build the RDKit? Hi, Being not an Oracle user, I cannot give you a concrete answer, but a quick Google search indicates that it might be a version inconsistency between the client and the server. Are you sure you are connecting to 11g? Please contact your database administrator for problems regarding Oracle database or ask the folks on the cx_Oracle mailing list: *http://lists.sourceforge.net/lists/listinfo/cx-oracle-users*http://lists.sourceforge.net/lists/listinfo/cx-oracle-users -Eddie On Oct 20, 2010, at 11:03 AM, *sridhar.kuntamukk...@thomsonreuters.com*sridhar.kuntamukk...@thomsonreuters.comwrote: Hi, I have downloaded the “*Windows x86 Installer*http://prdownloads.sourceforge.net/cx-oracle/cx_Oracle-5.0.4-11g.win32-py2.5.msi?download(Oracle 11g, Python 2.5)” to my PC and installed it. From Python command-line, when I try to connect to oracle, I get the following error. C:\RDkitpython Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. import cx_Oracle con = cx_Oracle.connect(user, pwd, chemdev) Traceback (most recent call last): File stdin, line 1, in module cx_Oracle.DatabaseError: ORA-24315: illegal attribute type I have originally used the connstr=user/p...@chemdev' and this gave me the same error as above. Then I found the above syntax in the cx_oracle_doc folder’s readme.txt and am lost now. MY PC has Windows XP, Oracle is on a different server (with a TNS entry “chemdev”) and I just added TNS_ADMIN Registry_entry and the path of TNS_ADMIN to the PATH env. Variable. From a command prompt, sqlplus user/p...@chemdev works fine. Any thoughts? Many thanks Sridhar *From:* *nikolaus.sti...@novartis.com* nikolaus.sti...@novartis.com[mailto: nikolaus.sti...@novartis.com] * Sent:* Wednesday, October 20, 2010 2:07 AM* To:* Eddie Cao* Cc:* *rdkit-de...@lists.sourceforge.net*rdkit-de...@lists.sourceforge.net; Kuntamukkula, Sridhar (HlthcrScience); * rdkit-disc...@lists.sourceforge.net* rdkit-discuss@lists.sourceforge.net * Subject:* Re: [Rdkit-devel] How to build the RDKit? Hi Sridhar, congrats on getting things working. One more comment - maybe you want to post these kind of questions rather in the discuss than the devel list. It is much more populated and you will for sure get replies quicker. Cheers Nik *Eddie Cao **cao.yi...@gmail.com* cao.yi...@gmail.com** 10/19/2010 11:49 PM To *sridhar.kuntamukk...@thomsonreuters.com*sridhar.kuntamukk...@thomsonreuters.com cc *rdkit-de...@lists.sourceforge.net* rdkit-de...@lists.sourceforge.net Subject Re: [Rdkit-devel] How to build the RDKit? Hi Sridhar, Congratulations! If
Re: [Rdkit-discuss] Error depicting a smiles string
Greg, I found the files of interest and ran a few tests. The files resulting from the tests are in the attached archive and here are the details. The structures in question came from the non-aggregators set of Shoichet which were available on his web page. My original intent was to convert the SMILES files from the Shoichet set to SDF. This went smoothly enough until I had to process the SDF for a different purpose. Four structures were found to cause problems. In the attached archive, each offending structure has 5 associated files named according the the NGC ID associated with the original SMILES: .smi - The original SMILES. .sdf - The result I had found in my SMILES to SDF conversion having nan as the atom coordinates. .mol - Generated manually today by: m = Chem.MolFromSmiles('offending SMILES') AllChem.Compute2DCoords(m) print file ('blah.mol','w+'), Chem.MolToMolBlock(m) _fix.smi - This is the RDKit generated SMILES for the structure. _fix.mol - The result of the following after the code snip above: m=Chem.MolFromSmiles(Chem.MolToSmiles(m)) AllChem.Compute2DCoords(m) print file ('blah_fix.mol','w+'), Chem.MolToMolBlock(m) Only 14662 did not result in a fixed mol file. Interestingly, the first bad conversion only has nan for coordinates of the platinum hexachloride. After the SMILES round-trip, all coordinates are nan. Please let me know if you need any further details. -Kirk On Sat, May 1, 2010 at 10:24 PM, Greg Landrum greg.land...@gmail.comwrote: On Fri, Apr 30, 2010 at 12:56 PM, Greg Landrum greg.land...@gmail.com wrote: I don't see any problems in your script, so I have to assume that it's a problem with the binary you're using. I'm travelling and don't have a windows machine handy, so this will have to wait until I'm back home this weekend. Ok, I was able to reproduce this on my windows box. It's clearly a problem with the windows build: In [29]: m = Chem.MolFromSmiles('OC(=O)C11') In [30]: AllChem.Compute2DCoords(m) Out[30]: 0 In [31]: print Chem.MolToMolBlock(m) --- print(Chem.MolToMolBlock(m)) RDKit 2D 8 8 0 0 0 0 0 0 0 0999 V2000 -1.#IND1.#QNB0. O 0 0 0 0 0 0 0 0 0 0 0 0 -1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0 -1.#IND1.#QNB0. O 0 0 0 0 0 0 0 0 0 0 0 0 -1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0 -1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0 -1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0 -1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0 -1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 3 2 3 2 4 1 0 4 5 1 0 5 6 1 0 6 7 1 0 7 8 1 0 8 4 1 0 M END I will look into this and see where the problem lies. Note: whatever is going on here doesn't affect every depiction; other molecules do end up with correct coordinates. Best Regards, -greg -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss nan.tgz Description: GNU Zip compressed data -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit and PIL
Oops! I meant Q4_2009. I was working on a Windows system with Python 2.6. I did not build PIL from source - I simply downloaded the Windows installer. Reverting to the 1.1.6 version was the only change I made that fixed the font error. I have another LINUX system that has the same font error that I want to check. I don't remember which version of PIL was installed on that one, but I'm reasonably sure it was 1.1.7. Once I test that one, I'll let you know if I get a similar result. I'll also capture the complete error message on the Windows system and send that to you. I hope your vacation was good. -Kirk On Sat, Apr 3, 2010 at 8:19 AM, Greg Landrum greg.land...@gmail.com wrote: Hi Kirk, On Fri, Apr 2, 2010 at 2:47 PM, rkdeli...@gmail.com wrote: While updated various parts of my system, I found that the latest version of RDKit (Q3_2009) combined with the latest version of PIL (1.1.7) leads to errors when trying to execute Draw.MolToImageFile(...) FYI: there is a Q4_2009 version. The error stated cannot load font and not image was generated. When I downgraded to PIL v1.1.6, all is fine. Nothing mission critical here, just an FYI. Thanks for the information. Which system was this on? Did you build PIL both times yourself or get it in binary form? The reason I ask is that 1.1.7 works fine for me on the Mac: [3] m = Chem.MolFromSmiles('c1n1') [4] Draw.MolToImageFile(m,'blah.png') [5] import Image [6] Image.VERSION Out[6]: '1.1.7' The cannot load font type errors are, I think, generally related to freetype, so I'm a bit surprised that one version would work and the other not. Best regards, -greg -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] beta of Q4 2009 release up
Gianluca, Interesting. My failed Fedora 12 (32-bit) did not have any issues with PYTHON_INCLUDE_DIR but rather the CMake - Boost library issue that Greg describes. -Kirk On Wed, Jan 20, 2010 at 2:48 AM, Gianluca Sforna gia...@gmail.com wrote: On Wed, Jan 20, 2010 at 6:03 AM, Greg Landrum greg.land...@gmail.com wrote: Please note that the new release supports cmake-based builds and is, consequently, much easier to build than before. Notes on how to do builds on linux/mac are here: http://code.google.com/p/rdkit/wiki/BuildingWithCmake Windows instructions will be here (I'm still working on these): http://code.google.com/p/rdkit/wiki/BuildingOnWindows_2009Q4 I tried a build in my Fedora 12 x86_64, but a simple mkdir build; cd build; cmake ..; make failed because PYTHON_INCLUDE_DIR was not correctly set. Is there a reason why we are doing that hackery to find python libs? BTW, the following patch fixed my build: diff --git a/CMakeLists.txt b/CMakeLists.txt index 2f5be8f..f42198b 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -24,39 +24,10 @@ set(RDKit_PythonDir ${CMAKE_SOURCE_DIR}/rdkit) # defines macros: rdkit_python_extension, rdkit_test include(RDKitUtils) -#--- -# pull in python: -# start with a bit of hackery to allow the user to provide their own -# path to python: -if(PYTHON_LIBRARIES) - set(oPYTHON_LIBRARIES ${PYTHON_LIBRARIES}) -endif(PYTHON_LIBRARIES) -if(PYTHON_INCLUDE_DIR) - set(oPYTHON_INCLUDE_DIR ${PYTHON_INCLUDE_DIR}) -endif(PYTHON_INCLUDE_DIR) -find_package(PythonLibs) -if(oPYTHON_LIBRARIES) - set(PYTHON_LIBRARIES ${oPYTHON_LIBRARIES}) -endif(oPYTHON_LIBRARIES) -if(oPYTHON_INCLUDE_DIR) - set(PYTHON_INCLUDE_DIR ${oPYTHON_INCLUDE_DIR}) -endif(oPYTHON_INCLUDE_DIR) - - - -if(NOT PYTHON_LIBRARIES AND NOT PYTHON_INCLUDE_DIR) - set(PYTHON_FOUND NO) -else(NOT PYTHON_LIBRARIES AND NOT PYTHON_INCLUDE_DIR) - set(PYTHON_FOUND YES) -endif(NOT PYTHON_LIBRARIES AND NOT PYTHON_INCLUDE_DIR) -endif(oPYTHON_LIBRARIES) -if(oPYTHON_INCLUDE_DIR) - set(PYTHON_INCLUDE_DIR ${oPYTHON_INCLUDE_DIR}) -endif(oPYTHON_INCLUDE_DIR) - - -if(NOT PYTHON_LIBRARIES AND NOT PYTHON_INCLUDE_DIR) - set(PYTHON_FOUND NO) -else(NOT PYTHON_LIBRARIES AND NOT PYTHON_INCLUDE_DIR) - set(PYTHON_FOUND YES) -endif(NOT PYTHON_LIBRARIES AND NOT PYTHON_INCLUDE_DIR) -if(PYTHON_FOUND) - MESSAGE(STATUS Found Python libraries in ${PYTHON_INCLUDE_DIR} as ${PYTHON_LIBRARIES}) -else(PYTHON_FOUND) - MESSAGE(FATAL_ERROR Python libraries not found) -endif(PYTHON_FOUND) - - -include_directories(${PYTHON_INCLUDE_DIR}) +find_package(PythonLibs REQUIRED) +include_directories(${PYTHON_INCLUDE_PATH}) link_directories(${PYTHON_LIBRARIES}) + find_package(NumPy REQUIRED) include_directories(${PYTHON_NUMPY_INCLUDE_PATH}) Cheers G. -- Gianluca Sforna http://morefedora.blogspot.com http://www.linkedin.com/in/gianlucasforna -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SearchDb functionality in Q32009
Greg, Yes, that did the trick - thank you. Interestingly, my previous version didn't seem to have that dependency. Odd. Now if I can just get my apache web server to recognize it as well. 8^) -Kirk On Wed, Nov 25, 2009 at 9:47 PM, Greg Landrum greg.land...@gmail.comwrote: Dear Kirk, On Wed, Nov 25, 2009 at 10:06 PM, rkdeli...@gmail.com wrote: With a previous version of RDKit I had been able to do this: import SearchDb from SearchDb import parser What are the new namespaces to get this back up? SearchDb.py is in $RDBASE/Projects/DbCLI. This isn't in the usual recommended PYTHONPATH, so you'll have to add that directory explicitly, something like: export PYTHONPATH=$PYTHONPATH:$RDBASE/Projects/DbCLI hope this helps, -greg -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Compiling on Red Hat linux
Can't you get a 32-bit source file and rebuild it for 64? On Fri, Mar 27, 2009 at 2:36 PM, George Oakman oakm...@hotmail.com wrote: Thank you all for great comments, what I need to so is starting to be clearer. I just can't find a X86_64 lapack RPM for RHEL4, but I'll keep looking. If one of you knows where I could find it that'd be great. George. From: ig...@helix.nih.gov To: greg.land...@gmail.com Date: Fri, 27 Mar 2009 15:46:10 -0400 CC: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Compiling on Red Hat linux How about now? http://code.google.com/p/rdkit/wiki/BuildingOnLinux or http://code.google.com/p/rdkit/wiki/NewLinuxBuild Ah, this is much better! Would it be possible to add a bullet point with options for python-less build? even (?!) the default? That's definitely possible, but I wonder how advisable it is. What fraction of active linux boxes are running 64bit? I switched all of the linux computers I control to 64 bit for a couple of years now. Out of 60+ systems I think I have 32-bit installed on 2, one being my old home PC... There is simply no good reason to run 32-bit linux on 64-bit hardware. All of the old 32-bit programs work fine for me, and there is a bonus that I don't have the ancient 2/4 Gb filesize/RAM limits. Igor -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Windows Live Messenger just got better. Find out more!http://clk.atdmt.com/UKM/go/134665230/direct/01/ -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit - DbCLI
Fantastic! Thanks, Greg! After I got things working I've been able to generate a database and do some preliminary searches. I'm impressed at how quickly I can search ~100,000 compounds with SMARTS patterns. I have a feeling this one is going to get a lot of use. -Kirk On Mon, Nov 24, 2008 at 12:45 AM, Greg Landrum greg.land...@gmail.comwrote: Dear Kirk, On Fri, Nov 21, 2008 at 12:38 AM, Robert DeLisle rkdeli...@gmail.com wrote: After running through the process with exception handling in place I was able to isolate 10 structures that were being problematic. All of them had at least one bond designated as 0 order in the SD file - much as you found for some of the other structures previously. I assume that these passed the initial import step but are failing upon descriptor generation for obvious reasons. I suppose the only request that I have is for more graceful error handling. I've attached my (admittedly sloppy) version of CreateDB.py showing what I did to isolate the errors. The problem here was in the mol file parser: it was not correctly setting up bonds that have order 0. Now it generates a warning (order 0 isn't technically allowed by the ctab spec) and sets the bond up correctly. I also added some error checking to handle other bogus bond orders. This was entered as issue 2337369 ( https://sourceforge.net/tracker2/?func=detailaid=2337369group_id=160139atid=814650 ) and fixed in rev892. -greg
Re: [Rdkit-discuss] RDKit - DbCLI
Greg, After running through the process with exception handling in place I was able to isolate 10 structures that were being problematic. All of them had at least one bond designated as 0 order in the SD file - much as you found for some of the other structures previously. I assume that these passed the initial import step but are failing upon descriptor generation for obvious reasons. I suppose the only request that I have is for more graceful error handling. I've attached my (admittedly sloppy) version of CreateDB.py showing what I did to isolate the errors. -Kirk On Thu, Nov 20, 2008 at 1:33 PM, rkdeli...@gmail.com wrote: Indeed I can. Luckily I had a console window open with the error in place just as I saw your message: [13:21:16] INFO: Done: 54500 Traceback (most recent call last): File C:\RDKit_Q32008_1\Projects\dbcli\CreateDB.py, line 222, in module mol = Chem.Mol(str(pkl)) RuntimeError: Unknown exception I've just wrapped this one in a try-catch block as well. On Nov 20, 2008 1:17pm, Greg Landrum greg.land...@gmail.com wrote: Can you send me the console output without disclosing things you oughtn't to disclose? FYI: the deprecation warnings ought not to be causing the problem. There ought to be a bug report filed against this already, but it looks like I forgot to submit it. grn. -greg On Thu, Nov 20, 2008 at 9:06 PM, wrote: Greg, Thanks for the quick response. In reading my original question I realize I didn't explain myself well. Sorry about that. 8^) I'm trying to set up a database of ~100,000 structures which will be queried by very few structures at a time. While running CreateDB.py I get to the step that gives an output of: 'Generating fingerprints and descriptors:' In reading the output more closely I see that there are some deprecation warnings that mention a distance matrix - that's where my original question regarding a pairwise computation step came from. Regardless, after around 50,000 structures, I get a 'Runtime: unexpected exception' message and Python stops. Having done a bit more research I see that each molecule is passed through Atom Pair, Fingerprint, and Descriptor generation. I assume it is failing somewhere within those steps, but I haven't yet identified where or why. I have just wrapped all of those procedures in try-catch blocks in hopes of finding the offending structure. Once I have it, I'll do some tests on it and send it your way. -Kirk On Nov 20, 2008 12:41pm, Greg Landrum wrote: [moving a general-interest question to the mailing list] Hi Kirk, On Thu, Nov 20, 2008 at 6:03 PM, wrote: I have another question on DbCLI. After getting rid of problematic structures, I was able to get DbCLI to the pairwise comparison step, but my I'm not sure what the pairwise comparison step is with the DbCLI stuff. Step one is loading the database with CreateDb.py, step 2 is doing searches with SearchDb.py. What are you asking about? dataset has on the order of 100,000 structures. After about 50,000 structures Python issued an Unexpected error response and stopped. Is this likely due to the enormous size of a pairwise distance table for this dataset? Have to had problems with very large datasets in the past or has this typically worked smoothly? I must admit that I've never queried with that number of structures. My typical use case is to have a large database (10^5-10^6 compounds) and query that with a few (~10) structures. The code hasn't really been written to deal with giant query sets. That is doable, but it would require some reworking. Probably the best bet would be to support loading the queries from a database as well; that way you wouldn't have to reprocess the queries every time and could pretty easily handle the only loading a few at a time problem. It's an interesting thing to think about. -greg # $Id: CreateDb.py 665 2008-05-15 04:33:40Z glandrum $ # # Copyright (c) 2007, Novartis Institutes for BioMedical Research Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # * Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above # copyright notice, this list of conditions and the following # disclaimer in the documentation and/or
Re: [Rdkit-discuss] H-bond Acceptor problem
I agree with Nik an additional 2 pence. In fact, while reading Greg's original note, my thoughts were essentially identical to Nik's comments. -Kirk On Tue, Oct 28, 2008 at 2:40 AM, nikolaus.sti...@novartis.com wrote: Hi Greg, maybe some comments on your suggestions. 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the official Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? Personally, I would guess that most people would not expect to receive an N/O count if they are asking for H-donors and acceptors. Hence, I would propably use a different naming convention that includes the Lipinski specification (e.g. LipNumHAcc or similar). That way people will not get confused by very high counts for those values. 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? As you say - they are very basic but to me they look reasonable. If you actually want to tune them at a low level than I would propably change the F definition to fluoro's attached to aromatic rings only ( I know there is a lot of papers out there that discuss this issue ) but that's only me and I would guess that over time people should fine-tune these definitions to their own like anyway. My 2 pence Nik *Greg Landrum greg.land...@gmail.com* 28.10.2008 06:55 To rdkit-discuss@lists.sourceforge.net cc Subject Re: [Rdkit-discuss] H-bond Acceptor problem I wanted to make one more post on this topic, ask a couple questions (at the bottom of the post), and give people a few days to comment before I regenerate the regression test data and commit a change for this bug. On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey hans.pur...@gmail.com wrote: If the intention is to follow Lipinski's definitions of Hbond acceptors, then it should be a simple N+O count (look back at the original paper and that is how he difined it for simplicity). For those who are coming to this late, this is the NOCount() descriptor, which is already present in the RDKit. However, if the descriptor is intended to match a more intuitive/realistic definition of HBA, then N-H shouldn't be a part of it. I don't think I agree with this. There are plenty of cases of nitrogens with attached Hs that act as H-bond acceptors (I did a CCD search yesterday to be sure), but that's a side topic. Back to the main topic: since these descriptors are all defined in a module named Lipinski, and since this all qualitative anyway, I'd propose the following change: The existing NumHDonors and NumHAcceptors (with fixes, discussed below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then deprecate NOCount and NHOHCount (they will generate warnings when used in the next release and then be completely removed in the release after that). For the purposes of fixing the more complex HAcceptor descriptor I propose the following SMARTS: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0]),\ $([F;!$(F-*-F)])]')d There are two changes here: the third line and the last one. The third line includes nitrogens that have three neighbors and that are not connected to another atom that has a non-ring double bond to O, N, P, or S. The last line includes Fs that are not connected to another atom that has more than one F attached (to exclude CF3 and CF2). I realize these are not highly tuned, very detailed definitions like those in the fdef file discussed elsewhere on this thread, but are they acceptable for a qualitative descriptor? So, the two questions: 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the official Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? Thanks for any feedback, -greg - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the
[Rdkit-discuss] RDKit to PyMol
Greg, I notice within the Python API that RDKit has the ability to communicate with PyMol. I have not, however, been able to find an example and haven't quite figured it out on my own. Could you provide an example of opening a file in PyMol through RDKit, please? -Kirk
Re: [Rdkit-discuss] H-bond Acceptor problem
Good point, Hans. I see that within the available descriptors there are NHOHCount and NOCount, which I assume are equivalent to Lipinski's Donors and Acceptors. Also there are NumHAcceptors and NumHDonors which I would expect to differentiate themselves from the Linpinski versions in some way. -Kirk On Wed, Oct 15, 2008 at 1:19 PM, Hans Purkey hans.pur...@gmail.com wrote: If the intention is to follow Lipinski's definitions of Hbond acceptors, then it should be a simple N+O count (look back at the original paper and that is how he difined it for simplicity). However, if the descriptor is intended to match a more intuitive/realistic definition of HBA, then N-H shouldn't be a part of it. Hans On Oct 15, 2008, at 11:50 AM, Greg Landrum wrote: [heh, worse than sending a message without an attachment is hitting send before the message is done and sending a message without text... sorry] On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle rkdeli...@gmail.com wrote: As you know, I've been working with descriptors in RDKit, and I think I've found a bug in the calculation of H-bond Acceptors. Attached is an example structure, N-methyl-1H-indole-6-carboxamide. When I calculate NumHAcceptors for this structure, I get 3. I've looked at numerous other strucures and it seems that nitrogens are always counted. I went into the code and found the definitions used for HAcceptors: Here's a simpler case showing the same behavior: [15] m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1') [16] Lipinski.NumHAcceptors(m2) Out[16]: 3 so that confirms the wrong count $([O,S;H1;v2]-[!$(*=[O,N,P,S])]) $([O,S;H0;v2]) $([O,S;-]) $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]) $([N;v3;H0]) $([n,o,s;+0]) F Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH groups are being counted as an acceptor due to matching $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be excluded according to this same definition? [20] m2.GetSubstructMatches(Chem.MolFromSmarts('[$([Nv3;H1,H2]-[!$(*=[O,N,P,S])])]')) Out[20]: ((1,),) Only matches one nitrogen... the amide nitrogen. The aromatic N matches the second but last definition: [29] m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]')) Out[29]: ((6,),) The problem is that the first definition matches an N that is single bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not exclude Ns that are single bonded to an atom that is doubly bonded to O,N,P, or S. So your amide with a secondary N matches. The problem isn't the matcher, it's the definition. Is that clear? I agree that this is a bug in the definition and will fix it. Would you mind entering the bug at sf.net or should I do it? -greg - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit Descriptors
Greg, Thank you for the response. I was able to get PEOE_VSA1 through PEOE_VSA14, SMR_VSA1 through SMR_VSA10, and EState_VSA1 through EState_VSA11 working. Are these the correct limits on the vector components? I was unable, however, to get Slogp_VSA or VSA_EState working with any integer suffix between 1 and 10. I've also done a correlation analysis on all the descriptors that I've gotten working. After computing descriptors for some 24,000 compounds I removed those with less than 10% variance and limited correlations between variables to a maximum of 0.85 (using KNIME). I'm happy to send a list of the resulting descriptors or a correlation matrix if you or anyone else is interested. On Wed, Sep 17, 2008 at 11:36 PM, Greg Landrum greg.land...@gmail.comwrote: Dear Kirk, On Thu, Sep 18, 2008 at 12:58 AM, Robert DeLisle rkdeli...@gmail.com wrote: I've finally found time to start using RDKit and started with descriptor calculation. Following the examples on the wiki (http://code.google.com/p/rdkit/wiki/DescriptorsInTheRDKit), I get a KeyError any time I attempt to obtain HeavyAtomCount, RingCount, HeavyAtomCount and RingCount were introduced after the May release, so they're in the subversion version of the code. They will be in the Q3 release (which will happen sometime in the next couple of weeks, hopefully). PEOP_VSA, SMR_VSA, Slogp_VSA, EState_VSA, and VSA_Estate. The various X_VSA descriptors are vector-valued and you access them by element, so you could ask for PEOE_VSA4 or Slogp_VSA10. (BTW, what is the difference between the two last VSA descriptors?) The standard VSA descriptors provide map summed VSA values into bins determined by the other descriptor. So, for example, SMR_VSA uses atomic contributions to the VSA and uses bins determined by atomic contributions to the SMR. EState_VSA is the same, it just uses atomic EState values. VSA_EState is reversed: atomic EState values are put into bins determined by the VSA contributions. Best Regards, -greg
[Rdkit-discuss] RDKit Descriptors
I've finally found time to start using RDKit and started with descriptor calculation. Following the examples on the wiki ( http://code.google.com/p/rdkit/wiki/DescriptorsInTheRDKit), I get a KeyError any time I attempt to obtain HeavyAtomCount, RingCount, PEOP_VSA, SMR_VSA, Slogp_VSA, EState_VSA, and VSA_Estate. (BTW, what is the difference between the two last VSA descriptors?) -Kirk DeLisle