Re: [Rdkit-discuss] want advice for good teaching data set
Hi Andrew ChEMBL 24 has compound properties in the table compound_properties. I think the alogp is computed using (Crippen) atom types and the acd_logp is uses ACD labs methods. TJ On Wed, Aug 29, 2018 at 5:52 AM Andrew Dalke wrote: > Hi all, > > I am starting to put together materials for the Python/RDKit training > course I'm giving just before the RDKit UGM next month. > > I would like to structure part of it around the SQLite release of the > ChEMBL data set. More specifically, I plan to include examples of machine > learning with scikit-learn, using RDKit descriptors and values from ChEMBL > 24 (and making sure to use the new schema). > > Two problems. First, I'm not a computational chemist and I don't know what > would constitute a good example to use. "Good" in this case means one whose > outlines are well-known to likely students. Second, I don't have much > experience with the ChEMBL data. > > My thought is to make a logP model. The easiest would be to based it on > atom types. For this option, can anyone suggest where I can find logP data > from ChEMBL? > > Another possibility is to use a pre-existing model, like the notebook > George Papadatos did for Ligand-based Target Prediction at > http://nbviewer.jupyter.org/gist/madgpap/10457778 . > > Perhaps someone here could point me to other existing resources along > similar lines? > > Best regards, > > Andrew > da...@dalkescientific.com > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS for an amide in an aromatic ring
Try either of these: [N,n](C)-,:[C,c](=[O]) C[N,n]-,:[C,c](=[O]) TJ O'Donnell On Mon, Sep 18, 2017 at 4:26 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Given the following aromatic structure > > m = Chem.MolFromSmiles("CN1C=CC(N)=NC1=O") > > I would like to construct a SMARTS pattern to > recognize the aromatic amide (nitrogen attached to > the exocyclic methyl group) and not recognize the other > NCO group of atoms. > > > I have tried > > pattern = Chem.MolFromSmarts('[N,n]-,:[C,c](=[O])') > > but, this matches *both* NCO groups of atoms which > I do not want. > > > The completely "aliphatic version" > > pattern = Chem.MolFromSmarts('[N]-[C](=[O])') > > does not match either NCO group of atoms. > > I am stumped. I have also tried several recursive > SMARTS expressions, but I can't get the syntax > right. > > I would appreciate any suggestions. Thank you. > > > Regards, > Jim Metz > > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Non-redundant database of molecules
Let the database do the work for you. Create a canonical SMILES column and/or InChI column and declare them to be unique. As you insert new rows, postgres will let you know if there is already a row with the same SMILES or InChI. Here's some help on how to handle that. https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT TJ O'Donnell On Wed, Sep 13, 2017 at 3:13 AM, Wandré <wandrevel...@gmail.com> wrote: > Hi, > > My name is Wandré and I'm from Brazil. > I'm trying to do a big database of molecules, but, I want to eliminate all > the redundant molecules before insert them in database. > I want to know what is the best method to identify one molecule in RDKit. > Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to > compare all molecules, one by one, before insert them in database (using > Tanimoto)? > This can be hard to do because my database will have lot of millions of > molecules, so, compare one by one before insert is the only answer? > Compare if the SMILES as already inserted is easy (text compare), but, > compare fingerprint of molecule... > > If I really need to compare the fingerprint of molecule, how to store this > data in PostgreSQL without use cartridge? I will generate the fingeprint > (Atompair, for example) and store this fingerprint in database and compare > all the fingerprints, one by one, before insert a now molecule. This > fingerprint (Atompair) have lot of features, so, store this in relational > database is expensive. > It is possible? > > Thanks! > > -- > Wandré Nunes de Pinho Veloso > Professor Assistente - Unifei - Campus Avançado de Itabira-MG > Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG > Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e > Inteligência Computacional - UNIFEI > Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ > Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG > Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens
I verified that r6 does the trick. Using my rdchord cartridge, I get tjo=> select rd.list_matches(rd.rdmol('OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO)OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O'), '[O;H0;D2;r6]',0,1); list_matches {{4},{11},{18},{25},{32},{39}} (1 row) tjo=> select rd.list_matches(rd.rdmol('OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO)OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O'), '[O;H0;D2;!r6]',0,1); list_matches {{6},{13},{20},{27},{34},{41}} (1 row) Here's an image showing the atom numbers corresponding to the list_matches output. TJ [image: Inline image 2] On Wed, Sep 6, 2017 at 6:04 PM, TJ O'Donnell <t...@acm.org> wrote: > Try using [O;H0;D2;r6] lower-case r. Sorry I'm not at a computer to > check this. > R6 means in 6 rings. > r6 means in ring of size 6. > > http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html > > TJ O'Donnell > > On Wed, Sep 6, 2017 at 4:34 PM, James T. Metz via Rdkit-discuss < > rdkit-discuss@lists.sourceforge.net> wrote: > >> Hello, >> >> Given the following SMILES for a macrocyclic hexaose >> >>OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO) >> OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O >> >> can anyone suggest a SMARTS pattern that will distinguish ether >> oxygens >> in the smaller 6-membered rings versus the ethers in the larger >> macrocyclic >> structure? >> >> For example, using RDkit, I have tried (e.g., pattern = >> Chem.MolFromSmarts('[O;H0;D2]') ) >> >> [O;H0;D2] ===> gives 12 matches (all ether oxygens) >> >> [O;H0;D2;R] ===> gives 12 matches (all ether oxygens) >> >> [O;H0;D2;!R] ===> gives 0 matches >> >> [O;H0;D2;R6] ===> gives 0 matches >> >> >> I am stumped. Any ideas? >> >> If it is necessary to write more complicated PYTHON/RDkit/SMARTS >> code, I am certainly willing to try that. >> >> Thanks! >> >> Regards, >> Jim Metz >> Northwestern University >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens
Try using [O;H0;D2;r6] lower-case r. Sorry I'm not at a computer to check this. R6 means in 6 rings. r6 means in ring of size 6. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html TJ O'Donnell On Wed, Sep 6, 2017 at 4:34 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Given the following SMILES for a macrocyclic hexaose > >OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO) > OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O > > can anyone suggest a SMARTS pattern that will distinguish ether oxygens > in the smaller 6-membered rings versus the ethers in the larger macrocyclic > structure? > > For example, using RDkit, I have tried (e.g., pattern = > Chem.MolFromSmarts('[O;H0;D2]') ) > > [O;H0;D2] ===> gives 12 matches (all ether oxygens) > > [O;H0;D2;R] ===> gives 12 matches (all ether oxygens) > > [O;H0;D2;!R] ===> gives 0 matches > > [O;H0;D2;R6] ===> gives 0 matches > > > I am stumped. Any ideas? > > If it is necessary to write more complicated PYTHON/RDkit/SMARTS code, > I am certainly willing to try that. > > Thanks! > > Regards, > Jim Metz > Northwestern University > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] connecting to postgres in rdkit environment
The server itself must be told to allow remote connections. You might check these two things. 1. You can edit the postgresql.conf file (not sure where that is on your system). https://www.postgresql.org/docs/9.2/static/runtime-config-connection.html Uncomment or add the line listen_addresses='*'. You can tailor that to be more specific, but try this first. 2. The file pg_hba.conf also controls access. Look at this: https://www.postgresql.org/docs/9.3/static/auth-pg-hba-conf.html Be sure to restart the server after you make changes to these files. Hope this helps, TJ O'Donnell On Sat, Feb 25, 2017 at 12:34 PM, <nbell8...@yahoo.com> wrote: > Hi, > I've installed rdkit on a CentOS machine using anaconda python and set up > a postgresql compound database in the rdkit environment. It works great on > the machine's console. > I now want to access it remotely and I'm trying to set up a jdbc postgres > driver to access it from a windows client but this is not working. If I > test the driver on the server it tells me that the connection is refused > and I should check that the machine is accepting TCP requests. > > I have opened the standard port that postgres uses > -A INPUT -m state --state NEW -m tcp -p tcp --dport 5432 -j ACCEPT > > iptables -L returns > ACCEPT tcp -- anywhere anywherestate NEW tcp > dpt:postgres > > this is where I don't know what to check next. A few things that might be > relevant. If I "ps -eaf | grep post" I see four postgres processes running > under my username (not postgres), so I think there is a server working. > There is also a "system" postgresql (version 9.2) which I have connected to > previously a long time ago. This connection no longer works either and I > don't really care about that but could be an interfering factor. > > If anyone has suggestions about what to check next or solve this I'd be > grateful > > thanks, > Neil > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Struggling with apache + rdkit + django
I would suggest setting PYTHONPATH in config or ini files for Apache or Django or uwsgi Not sure which is required. On Tue, Jun 21, 2016 at 11:15 AM, Téletchéa Stéphane < stephane.teletc...@univ-nantes.fr> wrote: > Le 21/06/2016 20:05, Bennion, Brian a écrit : > > What is the actual problem that is occurring? You have listed what you > have tried to do to fix a problem. > > > > Brian > > Dear Brian, > > I get a 500 error meaning something is not working properly, but no > trace in logs (either apache or django), > so I can only "assume" it comes from there since in the "developper" > mode there is no problem (everything works as expected). > > Sorry for the confusion, > > Stéphane > > -- > Assistant Professor in BioInformatics, UFIP, UMR 6286 CNRS, Team Protein > Design In Silico > UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322 > Nantes cedex 03, France > Tél : +33 251 125 636 / Fax : +33 251 125 632 > http://www.ufip.univ-nantes.fr/ - http://www.steletch.org > > > > -- > Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San > Francisco, CA to explore cutting-edge tech and listen to tech luminaries > present their vision of the future. This family event has something for > everyone, including kids. Get more information and register today. > http://sdm.link/attshape > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] molecule standardization in cartridge search
Tim, I have a set of postgres python (PL/Python) functions using rdkit. It is available at https://github.com/tjod/rdchord and some docs at https://github.com/tjod/rdchord/wiki TJ O'Donnell On Fri, Sep 25, 2015 at 6:54 AM, Tim Dudgeon <tdudgeon...@gmail.com> wrote: > Jan, > > thanks for that. I'll give it a try. > Are there any examples of writing RDKit functions and procedures for > postgres in python? > I see this general postgres docs: > http://www.postgresql.org/docs/9.4/static/plpython.html > but wondered if there are any RDKit specific examples anywhere? > > Tim > > On 25/09/2015 08:30, Jan Holst Jensen wrote: > > On 2015-09-24 16:22, Tim Dudgeon wrote: > >> I'm trying to get to grips with using the RDKit cartridge, and so far > >> its going well. > >> One thing I'm concerned about is molecule standardization, along the > >> lines of the ChemAxon Standardizer that allows substructure searches to > >> be done is a way that is largely independent of the quirks of structure > >> representation. The classic example would be how nitro groups are > >> represented, so that it didn't matter which nitro representation was in > >> the query or target structures, because both were converted to a > >> canonical form. > >> > >> My initial thoughts are that this would be done by: > >> 1. loading the "raw" structures into a source column that would never be > >> changed > >> 2. defining a function that performed the necessary transform to > >> generate the canonical form of a molecule. > >> 3. generating a "canonical" structure column that was the result of > >> passing the raw structures through that function > >> 4. building the SSS index on that canonical column > >> 5. executing queries using that function to canonicalize the query > >> structure > >> > >> The problem I'm finding is that there do not seem to be postgres > >> functions defined for doing molecular transforms (essentially a reaction > >> transform) and doing things like removing explicit hydrogens. At least > >> not in the functions listed on this page: > >> http://rdkit.org/docs/Cartridge.html#functions > >> > >> Am I missing something here, or might I be barking up completely the > >> wrong tree? > >> > >> Tim > > > > Hi Tim, > > > > We have about the same situation and we're adding standardization > > (beyond what RDKit implicitly does when it sanitizes the molecule) > > through Python stored procedures. You will need to build and maintain > > a normal Python-enabled RDKit installation in parallel to the > > cartridge. The Python stored procedures can access the normal RDKit > > installation and then run whatever Python code is necessary to do > > additional molecule cleanup. > > > > You will need to tweak your Postgres environment so the Python stored > > procedures can load RDKit. This is what I have defined in an > > environment file on CentOS: > > > > RDBASE=/opt/rdkit > > LD_LIBRARY_PATH=/opt/rdkit/lib > > PYTHONPATH=/opt/rdkit > > > > On Ubuntu this would go into /etc/postgresql/9.x/main/environment (in > > a slightly different format where the values have to be single-quoted). > > > > Cheers > > -- Jan, Biochemfusion > > > > -- > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] https://sourceforge.net
How about including a link on sourceforge to this: https://help.github.com/articles/support-for-subversion-clients/ so that folks without git clients can get started. TJ On Fri, Mar 20, 2015 at 9:48 PM, Greg Landrum greg.land...@gmail.com wrote: The mailing lists and one form of the downloads are hosted there. It's a very good point that having the trackers still active on sourceforge is confusing. I just deleted them. We should also do something about the svn repo that's there, just to make clear that it's no longer active. Does anyone see a problem with me doing a commit there that removes all the code and just leaves a look in github readme? On Fri, Mar 20, 2015 at 7:44 PM, Soren Wacker swac...@ucalgary.ca wrote: Hi, rdkit has moved to github, but there is still the repository on sourceforge.net. However, if you google 'rdkit bugs' the sourceforge page comes up first. I find that confusing. Is there a reason to keep the sourceforge.net stuff? If not, why don't you remove the sourceforge repository? kind regards Soren -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Oracle, pypl and rdkit
I've implemented a suite of rdkit functions for postgres using plpython https://github.com/tjod/rdchord and the overhead is minimal since most of the heavy lifting of substructure searching is done by rdkit. I think the same would be true of oracle. --- TJ O'Donnell On Thu, Mar 12, 2015 at 4:24 PM, Michal Krompiec michal.kromp...@gmail.com wrote: Hello, has anybody tried to implement substructure searching in an Oracle database using PYPL and RDKit? Is it just a matter of writing a wrapper function for molecule.HasSubstructMatch(pattern) or is the overhead of calling pypl each time too costly timewise? Do consecutive pypl calls always share the same interpreter? Best wishes, Michal -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] autodock vina pdbqt file to mol2
Babel can read and write both pdbqt and mol2 files. I'm not sure how the atom ordering might be accomplished though. TJ On May 9, 2014 2:43 PM, Jan Domanski jan...@gmail.com wrote: Thanks for the quick reply Christos! I found the pdbqt_to_pdb script that you mentioned but a google search for a pdbqt to mol2 yield nothing (other than this thread). the pdbqt_to_pdb converter is very crude: it retains only the best pose from _out.pdbqt and it basically just strips the BRANCH and ROOT tags deposited by autodock (which I was doing anyway with the sed). The main problems remaining are atom order (I can fix that) and missing hydrogens (can't fix that). There is a mode where I can prevent the prepare_ligand4.py from removing the hydrogens – but the output poses then have really weird geometry. But let's refocus a little bit: this is not an autodock vina question (although many folks here are knowledgeable enough to help me). This is a question on a mol2 file to which it should be possible to add Hs with rdkit and it's somehow not happening (at least not in my hands). My mol2 could be somehow malformatted. On 9 May 2014 20:57, Christos Kannas chriskan...@gmail.com wrote: Hi Jan, AutoDock has a set of tools (MGLTools) that have tools to convert pdb to pdbqt and vice-versa. If I recall it can also convert pdbqt to mol2 also. See this discussion http://autodock.1369657.n2.nabble.com/ADL-pdbqt-to-mol2-td6755769.html Best, Christos Christos Kannas Researcher Ph.D Student Mob (UK): +44 (0) 7447700937 Mob (Cyprus): +357 99530608 [image: View Christos Kannas's profile on LinkedIn]http://cy.linkedin.com/in/christoskannas On 9 May 2014 20:17, Jan Domanski jan...@gmail.com wrote: Hi guys, I'm really stuck here: I have some output from autodock vina in a rather obscure pdbqt format. It's a little bit like pdb but not quite. I'm trying to get back a mol2 file. The autodock pdbqt file has only the polar hydrogens in it – part of the trick is to re-add the hydrogens. Example autodock vina output is attached (it's a conformer of the ACE native ligand DUDE). First of all, I convert that to a PDB file by doing a simple sed, sed -e '/ROOT/d' -e '/BRANCH/d' Then I reorder the atoms to match those of the original crystal_ligand.mol2 (because autodock re-orders the atoms duh). Finally, I save a mol2 file out (attached) ordered as the original crystal_ligand and with polar hydrogens (for each pose of a conformer). Let's go to rdkit and try to add hydrogens: mol = Chem.MolFromMol2File(output, removeHs=False) mol2 = AllChem.AddHs(mol, addCoords=True) print mol.GetNumAtoms(), mol2.GetNumAtoms() 44 44 So, only the implicit hydorgens are present. Calling AddHs doesn't raise an error and it doesn't really change the number of hydrogens... Now this may not be the best way of doing things: what I care for is to get a mol2 from autodock vina that I can compare to the original mol2 from DUD (same atom order, same number of atoms). Maybe there are other ways to achieve this: one idea would be to inject the docked pose coordinates into the original mol2 atoms (heavy and polar hydrogens) and somehow adjust the non-polar hydrogens. Thanks, - Jan -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit cartridge - opposite of mol_from_ctab() would be nice.
Hi All I would like to announce the availability of a somewhat different rdkit-based postgresql extension. This uses rdkit for all the basic cheminformatics functions (canonical smiles, molfile handling, smarts matching, fingerprints, etc.) but is based on the use of postgres' plpython language. This does not use the existing rdkit postgres cartridge, although I have demonstrated that the two can be used side-by-side (via the use of rdkit pickled mol objects). I hope this use of python might make it easier to extend postgres even further with additional functions based on rdkit. The code can be checked out from sourceforge using this: svn checkout svn://svn.code.sf.net/p/sci3d/code/trunk/openchord/src/rdkitchord This is a work in progress, so I would appreciate any feedback. There are still some wrinkles that need to be ironed out. I plan to document the installation and useage better, probably using github. TJ O'Donnell On Sat, Feb 22, 2014 at 10:53 PM, Greg Landrum greg.land...@gmail.comwrote: On Fri, Feb 21, 2014 at 5:45 PM, Jan Holst Jensen j...@biochemfusion.comwrote: Hi Greg, It would be great to gain the experience. I am working on a registration project where we will likely need to surface additional functions in the cartridge, just to try them out. So, knowing how to do that in a way where things that turn out useful can be contributed back cleanly would be great. Sounds good. if structures don't have conformers Ah, yes; good question. Decisions, decisions... I'll dodge the question :-) and say it sounds like a perfect fit for an optional parameter, e.g. mol_to_ctab(m mol, add_depiction_if_missing bool default true) I would go for default true because I believe that is the general preference. Having the optional argument that defaults to true make sense to me. Here's an attempt to briefly summarize what needs to be changed in order to add the new functionality: - Add mol_to_ctab to rdkit_io.c - Add molToCtabText (or some such thing) to adapter.cpp and rdkit.h - Add mol_to_ctab() definitions to rdkit.sql91.in and, if you want to support older versions of postgres, rdkit.sql.in - Update link dependencies in Makefile if necessary (will be necessary if you add depictions) - Add tests to one of the files in sql/ (the most logical place is probably rdkit-91.sql and rdkit-pre91.sql if you are supporting older versions) and the corresponding output file in expected/ I think that's it. -greg Cheers -- Jan On 2014-02-21 16:47, Greg Landrum wrote: Hi Jan, Great idea. I'd be happy to add it, but I can also talk you through it if you want to gain the experience. One important question: if structures don't have conformers (if they are loaded from SMILES, for example), should ctabs with all zero coordinates be generated or should depictions be generated? -greg On Fri, Feb 21, 2014 at 2:23 PM, Jan Holst Jensen j...@biochemfusion.comwrote: Hi Greg, Are there any plans for a mol_*to*_ctab() function in the PG cartridge ? Would make SD file export from the database a bit easier. If there are no immediate plans, I can take a stab at adding it myself. * Looks like rdkit_io.c is the place to add it ? * Should I manually define the new SQL function in rdkit.sql.in, or is there some higher-level place I should add it instead ? Cheers -- Jan -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB reader and bond perception
Hi JP I use this file from PDB Europe: ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz Useful links followed from http://www.ebi.ac.uk/pdbe-srv/pdbechem/ The pdb.tar.gz file has the standard residues and LOTS of others with specific CONNECT records. TJ On Mon, Jan 13, 2014 at 9:54 AM, JP jeanpaul.ebe...@inhibox.com wrote: RDKitters! Finally back on the mailing list! I am sure we've been through this at the UGM (my mind must have wandered off!), but a quick question about the PDB reader and bond perception. Is this supported with the current PDB reader? I remember that someone (PaulE, perhaps?) was saying bond perception was painful, but there was some dictionary for PDB ligands which helps (any idea the name of this dictionary?). To the technical details. I am reading in the following PDB file with a simple MolFromPDBFile() call: HETATM1 O1P 84T A1862 -27.016 9.387 -72.564 1.00 20.81 O HETATM2 P 84T A1862 -27.282 9.818 -73.968 1.00 19.65 P HETATM3 O2P 84T A1862 -27.881 11.176 -74.182 1.00 21.49 O HETATM4 N 84T A1862 -25.869 9.583 -74.813 1.00 19.78 N HETATM5 C 84T A1862 -25.759 10.010 -76.075 1.00 19.97 C HETATM6 CA 84T A1862 -24.493 9.748 -76.807 1.00 19.75 C HETATM7 CB 84T A1862 -24.794 8.678 -77.847 1.00 19.73 C HETATM8 CG 84T A1862 -23.571 8.324 -78.681 1.00 19.70 C HETATM9 CD2 84T A1862 -23.309 9.519 -79.611 1.00 18.49 C HETATM 10 CD1 84T A1862 -23.863 6.932 -79.305 1.00 18.60 C HETATM 11 OHB 84T A1862 -25.210 7.467 -77.223 1.00 19.17 O HETATM 12 OH 84T A1862 -23.549 9.127 -75.984 1.00 20.33 O HETATM 13 O 84T A1862 -26.672 10.517 -76.692 1.00 20.26 O HETATM 14 O5' 84T A1862 -28.377 8.861 -74.619 1.00 19.39 O HETATM 15 C5' 84T A1862 -28.002 7.536 -74.954 1.00 18.47 C HETATM 16 C4' 84T A1862 -28.909 7.000 -76.012 1.00 18.24 C HETATM 17 C3' 84T A1862 -28.901 7.826 -77.298 1.00 18.28 C HETATM 18 C2' 84T A1862 -30.318 7.610 -77.768 1.00 18.69 C HETATM 19 O2' 84T A1862 -30.789 8.641 -78.581 1.00 19.64 O HETATM 20 O4' 84T A1862 -30.262 6.951 -75.529 1.00 18.80 O HETATM 21 C1' 84T A1862 -31.152 7.470 -76.521 1.00 19.01 C HETATM 22 N9 84T A1862 -31.753 8.732 -76.009 1.00 20.08 N HETATM 23 C4 84T A1862 -33.033 9.013 -76.158 1.00 21.10 C HETATM 24 N3 84T A1862 -34.018 8.339 -76.786 1.00 21.58 N HETATM 25 C2 84T A1862 -35.263 8.846 -76.830 1.00 21.95 C HETATM 26 C8 84T A1862 -31.223 9.701 -75.291 1.00 20.27 C HETATM 27 N7 84T A1862 -32.173 10.618 -75.019 1.00 21.28 N HETATM 28 C5 84T A1862 -33.315 10.213 -75.563 1.00 21.81 C HETATM 29 C6 84T A1862 -34.624 10.702 -75.627 1.00 22.85 C HETATM 30 N1 84T A1862 -35.550 10.010 -76.285 1.00 22.44 N HETATM 31 N6 84T A1862 -35.008 11.862 -75.052 1.00 23.86 N TER END But I am losing all the double bond (and aromatic) information: m = Chem.MolFromPDBFile(sys.argv[1]) print Chem.MolToSmiles(m) Gives me: CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1 As usual, many thanks for your time, - Jean-Paul Ebejer Early Stage Researcher -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] incorrect stereochemistry
Hi All In a recent list of about 100,000 smiles, I ran into 512 that caused some problems. Basically, the stereochemistry of the canonicalized (isomericSmiles=True) smiles gets reversed. I saw some discussion of this topic a while back, but it seems it had not been resolved. [15:07:50] Warning: ring stereochemistry detected. The output SMILES is not canonical. Any help or input on this? Some offending smiles are below along with the code I used to test this. I can provide a file of 512 if you'd like. I'm using 2012.09.1, freshly compiled from svn and passing all tests TJ O'Donnell --- from rdkit import Chem import sys for line in sys.stdin: smi = line.split(None,1)[0] mol = Chem.MolFromSmiles(smi) if mol: print smi print Chem.MolToSmiles(mol, isomericSmiles=True) else: print can't parse smiles my truncated input CC1(c2cc(C(F)(F)F)cc(C(F)(F)F)c2)CCN([C@@]2(c3c3)CC[C@H](N3CCN(c4c4Cl)C(=O)C3)CC2)C1=O Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1 CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 -- my truncated output ; input smiles/output smiles pairs of lines -- N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@H]1CC[C@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O Cn1ccc2ccc3c4[nH]c5c(5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@@H]1OO[C@@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@H]1OO[C@H]((=O)c2c2)OO1)c1c1 CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 C(=O)N[C@@]1(C(=O)N[C@H
Re: [Rdkit-discuss] postgresql
There are binaries available at http://www.postgresql.org/download/ and a nice wiki at http://wiki.postgresql.org/wiki/Main_Page The postgres community is great - check out the mailing lists at http://www.postgresql.org/community/ TJ - TJ O'Donnell, Ph.D. President, gNova Inc. t...@gnova.com On Wed, Jun 1, 2011 at 6:33 AM, Peter Schmidtke pschmid...@ub.edu wrote: Hey Paul, hope you are fine ;) What system/architecture are you using? ++ Peter On 01/06/2011, at 15:31, paul.czodrow...@merck.de wrote: dear rdkitters, i would like to install postgresql/sqlite. could anyone point to a good tutorial on how to set-up such a system? i know how to use google, but maybe you guys are faster... :) paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. -- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss Peter Schmidtke - PhD Student Department of Physical Chemistry School of Pharmacy University of Barcelona Barcelona, Spain -- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Contractors working with the RDKit?
I am willing and able to do consulting and contract programming using RDKit, using either Python or C++ http://gnova.com TJ TJ O'Donnell, Ph.D. President, gNova, Inc. t...@gnova.com On Sat, Mar 19, 2011 at 10:46 PM, Greg Landrum greg.land...@gmail.com wrote: Dear all, I was recently asked if there was anyone out there who was able to do contract development work with or on the RDKit. It's a good question, but I didn't have a good answer handy. So I'm asking here. If you currently do, or are willing to do, contract development work either extending the RDKit or developing new tools based on the RDKit, please reply to this thread. It would be helpful if you indicate your comfort level on both the C++ or Python sides. If there's sufficient interest/response, I'd be happy to include a section either on rdkit.org or on the wiki with names/links. Thanks, -greg -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] can't kekulize smiles generated by Chem.MolToSmiles
Hi Greg, As usual, thanks for your quick response. Yes, these were big molecules. Let me know if you'd like me to try out any changes. I can recompile changes from subversion easily now. I discovered these four examples using 1/10 of the chembl database and can try any new code changes on the entire set of 600K molecules. TJ On Sun, Jan 9, 2011 at 10:51 PM, Greg Landrum greg.land...@gmail.com wrote: Hi TJ, On Mon, Jan 10, 2011 at 2:37 AM, TJ O'Donnell t...@acm.org wrote: Thanks Greg. I compiled in the changes and that molfile works fine now, but. Here are four new examples of molfiles that convert to mol and smiles just fine, but the resulting smiles won't parse properly back to a mol. Can you take a look? Thanks for finding another good bug. The problem here is caused, as you probably guessed, by the size of the molecules (specifically by the fact that more than 50 rings were open at one point during the generation of the SMILES). I will get it fixed for the release. Best Regards, -greg -- Gaining the trust of online customers is vital for the success of any company that requires sensitive data to be transmitted over the Web. Learn how to best implement a security strategy that keeps consumers' information secure and instills the confidence they need to proceed with transactions. http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] can't kekulize smiles generated by Chem.MolToSmiles
I've stumbled onto a molfile which is read properly (MolFromMolBlock) and produces a proper smiles (MolToSmiles). But the smiles generated fails on Chem.MolFromSmiles. Can you help figure this one out? I've attached the molfile in question. Here is a simple script I used to show this issue. from rdkit import rdBase from rdkit import Chem import sys print rdBase.boostVersion print rdBase.rdkitVersion mb = sys.stdin.read() mol = Chem.MolFromMolBlock(mb) if mol: smi = Chem.MolToSmiles(mol, isomericSmiles=True) print smi newmol = Chem.MolFromSmiles(smi) and the result I get python rdmol.py 254080.mol 1_40 2010.12.1 [Cl-].CC(C)(C)c1[Te+]c(C(C)(C)C)cc(/C=C/C=C2C=C(C(C)(C)C)OC(C(C)(C)C)=C2)c1 [18:13:52] Can't kekulize mol I just rebuilt from subversion source - not sure why this version shows as 2010.12.1 RL: https://rdkit.svn.sourceforge.net/svnroot/rdkit/trunk Repository Root: https://rdkit.svn.sourceforge.net/svnroot/rdkit Repository UUID: 19320e9b-7711-0410-929e-f4fff3a11e9f Revision: 1611 Node Kind: directory Schedule: normal Last Changed Author: glandrum Last Changed Rev: 1611 Last Changed Date: 2011-01-05 00:45:35 -0800 (Wed, 05 Jan 2011) Thanks, TJ O'Donnell 254080.mol Description: Binary data -- Gaining the trust of online customers is vital for the success of any company that requires sensitive data to be transmitted over the Web. Learn how to best implement a security strategy that keeps consumers' information secure and instills the confidence they need to proceed with transactions. http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Question: modifying default parameters for the RDKit fingerprint?
Hi Greg- No objection here. I've been using 1024 with 2 bits here. Are you still using 2048 for the default size? TJ O'Donnell On Tue, Dec 28, 2010 at 11:33 PM, Greg Landrum greg.land...@gmail.com wrote: Dear all, As I mentioned in an earlier message (http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01430.html), the default parameters for the RDKit fingerprint end up setting far too many bits for drug-like molecules. The result of this is similarity values that are in general too high and more frequent occurrences of molecules that are similar to each other only due to bit collisions. The easy solution to this problem is to decrease the number of bits set per path found (the nBitsPerHash parameter) from 4 to 2. I propose doing this for the Q4 2010 release of the RDKit. The downside is that the fingerprints generated with that release will not be compatible with fingerprints from earlier releases unless you specify nBitsPerHash=4 on your own. The upside is a much more useful similarity fingerprint. Any objections to me making this change? -greg -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] reading tag data from string, not file
I can see how to read an sd file using SDMolSupplier and using mol.GetProp() to get the tag data from the file. But, I have each molblock (chunk of lines between in an sdf file) in a separate string. I don't see a way to get properties from that molblock string or even better from the mol=Chem.MolFromMolBlock(molblock) E.g. mol.GetPropNames() returns a null array (or just the private and computed props if mol.GetPropNames(True,True) Can you give me some hints on how I might get the property tag data from a string molblock? TJ O'Donnell -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolFromMolBlock never returns
HI Greg Thanks for the quick reply. Sure enough, the latest version of rdkit fixes the problem I was having. I should have tried that first! Now that I have the build issues worked out, a svn update and make install is pretty quick. TJ On Sun, Dec 26, 2010 at 8:31 PM, Greg Landrum greg.land...@gmail.com wrote: Hi TJ, 2010/12/24 TJ O'Donnell t...@acm.org: I have a mol file that causes MolFromMolBlock to get stuck. I reproduced this problem with this simple python script (below). I've attached the problem input molfile. I got the file from the chembl08 download. Another large molfile finishes in seconds, but I stopped this one after about 1 minute. Can you see what might be the problem? I'm afraid I am not using the most recent version, but one I built last July. There have been some fixes related to handling of large molecules since July. Certainly the current state of the code from svn (and probably the last release, though I haven't tried this) handles your SD file without problems or huge delays (less than half a second on my machine). Best Regards, -greg -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolFromMolBlock never returns
I have a mol file that causes MolFromMolBlock to get stuck. I reproduced this problem with this simple python script (below). I've attached the problem input molfile. I got the file from the chembl08 download. Another large molfile finishes in seconds, but I stopped this one after about 1 minute. Can you see what might be the problem? I'm afraid I am not using the most recent version, but one I built last July. From subversion: URL: https://rdkit.svn.sourceforge.net/svnroot/rdkit/trunk Repository Root: https://rdkit.svn.sourceforge.net/svnroot/rdkit Repository UUID: 19320e9b-7711-0410-929e-f4fff3a11e9f Revision: 1450 Node Kind: directory Schedule: normal Last Changed Author: glandrum Last Changed Rev: 1450 Last Changed Date: 2010-07-08 21:15:09 -0700 (Thu, 08 Jul 2010) Thanks, TJ O'Donnell, Ph.D. President, gNova, Inc. from rdkit import Chem import sys mb = sys.stdin.read() mol = Chem.MolFromMolBlock(mb) if mol: print len(mb),mol print Chem.MolToSmiles(mol, isomericSmiles=False) CDK3/26/10,13:38 469504 0 0 0 0 0 0 0 0999 V2000 10.8216 -24.66950. N 0 0 0 0 0 0 0 0 0 0 0 0 10.8142 -30.38920. C 0 0 0 0 0 0 0 0 0 0 0 0 11.2281 -29.67280. C 0 0 0 0 0 0 0 0 0 0 0 0 12.0536 -29.67410. C 0 0 0 0 0 0 0 0 0 0 0 0 12.4664 -30.39150. C 0 0 0 0 0 0 0 0 0 0 0 0 12.0474 -31.10860. C 0 0 0 0 0 0 0 0 0 0 0 0 11.2233 -31.10370. C 0 0 0 0 0 0 0 0 0 0 0 0 10.8074 -31.81630. C 0 0 0 0 0 0 0 0 0 0 0 0 9.9824 -31.81230. N 0 0 0 0 0 0 0 0 0 0 0 0 9.5730 -31.09550. C 0 0 0 0 0 0 0 0 0 0 0 0 8.7480 -31.09160. C 0 0 0 0 0 0 0 0 0 0 0 0 9.9889 -30.38300. O 0 0 0 0 0 0 0 0 0 0 0 0 8.3321 -31.80410. C 0 0 0 0 0 0 0 0 0 0 0 0 8.3389 -30.37520. N 0 0 0 0 0 0 0 0 0 0 0 0 7.5139 -30.37130. C 0 0 0 0 0 0 0 0 0 0 0 0 7.1048 -29.65480. C 0 0 0 0 0 0 0 0 0 0 0 0 7.0980 -31.08380. O 0 0 0 0 0 0 0 0 0 0 0 0 8.7415 -32.52090. C 0 0 0 0 0 0 0 0 0 0 0 0 9.7280 -33.41670. N 0 0 0 0 0 0 0 0 0 0 0 0 9.0115 -33.82580. C 0 0 0 0 0 0 0 0 0 0 0 0 8.4011 -33.27080. N 0 0 0 0 0 0 0 0 0 0 0 0 9.5665 -32.52490. C 0 0 0 0 0 0 0 0 0 0 0 0 10.4016 -32.51210. C 0 0 0 0 0 0 0 0 0 0 0 0 11.2308 -32.50580. C 0 0 0 0 0 0 0 0 0 0 0 0 13.2914 -30.39420. C 0 0 0 0 0 0 0 0 0 0 0 0 13.7019 -29.67400. N 0 0 0 0 0 0 0 0 0 0 0 0 14.1221 -30.37920. C 0 0 0 0 0 0 0 0 0 0 0 0 13.7017 -28.85570. C 0 0 0 0 0 0 0 0 0 0 0 0 13.7238 -31.10640. C 0 0 0 0 0 0 0 0 0 0 0 0 11.6317 -31.78480. C 0 0 0 0 0 0 0 0 0 0 0 0 12.8988 -31.11770. C 0 0 0 0 0 0 0 0 0 0 0 0 12.4962 -31.83780. N 0 0 0 0 0 0 0 0 0 0 0 0 12.8916 -32.56190. C 0 0 0 0 0 0 0 0 0 0 0 0 13.7164 -32.58150. C 0 0 0 0 0 0 0 0 0 0 0 0 12.4622 -33.26630. O 0 0 0 0 0 0 0 0 0 0 0 0 14.1457 -31.87700. N 0 0 0 0 0 0 0 0 0 0 0 0 14.1117 -33.30560. C 0 0 0 0 0 0 0 0 0 0 0 0 14.9705 -31.89670. C 0 0 0 0 0 0 0 0 0 0 0 0 15.3999 -31.19220. C 0 0 0 0 0 0 0 0 0 0 0 0 15.3659 -32.62070. O 0 0 0 0 0 0 0 0 0 0 0 0 13.6824 -34.01000. C 0 0 0 0 0 0 0 0 0 0 0 0 13.9991 -34.77120. C 0 0 0 0 0 0 0 0 0 0 0 0 13.3731 -35.30850. N 0 0 0 0 0 0 0 0 0 0 0 0 12.6686 -34.87910. C 0 0 0 0 0 0 0 0 0 0 0 0 12.8593 -34.07650. N 0 0 0 0 0 0 0 0 0 0 0 0 12.9851 -28.44690. O 0 0 0 0 0 0 0 0 0 0 0 0 14.4140 -28.43950. C 0 0 0 0 0 0 0 0 0 0 0 0 14.4097 -27.61450. N 0 0 0 0 0 0 0 0 0 0 0 0 15.1309 -28.84870. C 0 0 0 0 0 0 0 0 0 0 0 0 15.8432 -28.43250. C 0 0 0 0 0 0 0 0 0 0 0 0 15.9284 -27.61510. N 0 0 0 0 0 0 0 0 0 0 0 0 16.7345 -27.43940. C 0 0 0 0 0 0 0 0 0 0 0 0 17.1510 -28.15220. N 0 0 0 0 0 0 0 0 0 0 0 0 16.6021 -28.76800. C 0 0 0 0 0 0 0 0 0 0 0 0 13.6931 -27.20570. C 0 0 0 0 0 0 0 0 0 0 0 0 13.6888 -26.38070. C 0 0 0 0 0 0 0 0 0 0 0 0
[Rdkit-discuss] compile issues
Hi Greg I'm trying to build rdkit on a 64-bit redhat system. g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) I built boost 1.43, the latest flex, and got up to this point building rdkit [ 82%] Building CXX object Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/lex.yysln.cpp.o Linking CXX shared library libSLNParse.so /usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libboost_regex.a(cpp_regex_traits.o): relocation R_X86_64_32S against `std::basic_stringchar, std::char_traitschar, std::allocatorchar ::_Rep::_S_empty_rep_storage' can not be used when making a shared object; recompile with -fPIC /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libboost_regex.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[2]: *** [Code/GraphMol/SLNParse/libSLNParse.so] Error 1 make[1]: *** [Code/GraphMol/SLNParse/CMakeFiles/SLNParse.dir/all] Error 2 make: *** [all] Error 2 Can you help? Thanks, TJ O'Donnell -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss