[Rdkit-discuss] RDKit and Google Summer of Code 2018
Dear all, We've been invited again to participate in the OpenChemistry application for Google Summer of Code. In order to participate we need ideas for projects and mentors to go along with them. The current list of RDKit ideas is being maintained here: http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas (Note: at the point that I'm pressing "send", that's still a copy of last year's project ideas). If you're willing to be a mentor (please ask me about the ~5 hours/week required here) or have ideas, please reply to this thread. Best, -greg -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Behavior of ETKDG / EmbedMultipleConfs
Hi Andy, If -1 is used for the random number seed, the RDKit will use the current date (including seconds) as seed (Greg, please correct me if I’m wrong). Therefore, you get a different seed every time you run the script. If you use a fixed seed, you will generate the same conformations every time you run it. Note that if pruneRMSthresh > 0, the generated conformers will be pruned, i.e. conformers with a RMS < cutoff to any previous conformer will be discarded. As this happens at the very end of the conformer generation routine, no additional conformers will be generated to replace the discarded ones. This is why you get a varying number of conformers. I have run your script and I get the same weird third conformation. This should certainly not happen. I will look into it. Best, Sereina > On 12 Jan 2018, at 19:17, Andy Jenningswrote: > > Hi RDKitters, > > Whilst looking at generating some conformations of molecules using the ETKDG > method with EmbedMultipleConfs I've come across some strange (to me) behavior. > > When I generate conformations of some molecules with the randomSeed as -1 the > result is a variable number of conformations. That's not the strangest aspect > though - some of the conformations are quite bizarre based upon any geometry > rules I can think of. However, when the randomSeed is set to a fixed number > the odd behavior goes away and I get only reasonable conformations. > > To illustrate here is some code (please no criticism of my terrible style!): > > ### CODE ### > from rdkit import Chem > from rdkit.Chem import AllChem > import sys > > acamide = Chem.MolFromSmiles('O=C(NC=C)c1c1') > ETKDG = 1 > _seed = -1 > m = Chem.AddHs(acamide) > n = 3 > ps = AllChem.ETKDG() > ps.pruneRmsThresh = 0.5 > ps.numThreads = 0 > ps.randomSeed = _seed > fixIt = 0 > for i in range(0,100): > ids = AllChem.EmbedMultipleConfs(m, n, ps) > if fixIt: > for _id in ids: AllChem.UFFOptimizeMolecule(m, confId = _id) > sys.stderr.write('%d,' % len(ids)) > if len(ids) > 2: > outStream = Chem.SDWriter('test.sdf') > for _id in ids: > outStream.write(m,confId = _id) > outStream.flush() > outStream.close() > sys.stderr.write('\n') > break > > ### END CODE ### > > > This takes the smiles string for a simple acrylamide and generates a max of 3 > conformations for the molecule. The loop runs 100 times and halts when 3 > conformations are found - which is the sign of a bad conformation being > generated. When I run this the number of conformations generated each time > varies between 1-3 and it does so differently from run to run. > > For instance: > run #1: > 2,2,1,1,2,2,2,2,2,2,1,2,2,1,2,1,2,1,2,2,1,2,1,1,1,2,2,2,2,2,1,2,2,2,2,2,2,2,1,2,2,1,2,2,2,2,1,1,2,2,3, > run #2: 2,1,2,2,2,1,1,3, > run #3: 2,2,2,1,2,2,2,2,1,2,2,1,2,1,2,2,3, > and so on > > When I visually inspect test.sdf that results from a generation of 3 > conformers I find that one of the conformations has a very odd amide nitrogen > geometry - almost linear between the heavy atoms. > > If I change _seed to a number such as '1' I get a single conformation for > every run. > > If I implement the UFF optimization (with fixIt = 1) then I'll still get > multiple conformations but they all look reasonable. > > So, I'm not sure if there is some systematic problem here or I'm just failing > to understand the appropriate way to implement this form of conformational > search. Any insights are welcome. > > Best, > Andy > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! > http://sdm.link/slashdot___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Explicit hydrogens in substructure search
On Thu, Jan 11, 2018 at 7:23 PM, Andreywrote: > > I managed to get it working for Python wrapper. Could you please give me > an idea how to implement it for Postgres cartridge? > > I don't understand the question. The blog post I pointed you to earlier in the thread: http://rdkit.blogspot.ch/2016/07/tuning-substructure-queries-ii.html focuses on using this functionality with the cartridge. Did that not work for you or are you looking for something different? > Kind regards, > > Andrew > > > > 13.12.2017 08:58, Greg Landrum > >On Tue, Dec 12, 2017 at 7:28 PM, Andrey wrote: > > > > > > > > Does this depend on removeHs() function? I mean, to make MergeQueryHs() > > > work, should I do removeHs=False first for all compounds in my > database, to > > > preserve implicit\explicit hydrogens in their structure? > > > > > > > The MergeQueryHs() functionality is primarily intended to be used for > > molecules where the Hs have been removed. > > > > -greg > > > > > > > > > > > > > > > > > -- реклама --- > Программа для автоматизации бизнеса для ленивых эгоистов. > CRM OneBox - https://goo.gl/TDv2xT > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] mol file parsing, 3D or 2D
Hi Jason, On Sun, Jan 14, 2018 at 8:23 PM, Jason Biggswrote: > Two question about mol file conformer reading: > > Looking through the .mol files included for testing, and chose > "Code/GraphMol/Depictor/test_data/7UPJ_spread.mol" at random. > > When I read in this file using the RDKit::MolFileToMol function, and then > query its conformer's is3D() method, it returns true even though it is > definitely a 2D depiction in the file. I'm not totally familiar with the > MDL file specifications, so is there some flag I'm missing in the file? > There is an optional flag that can be present on the second line of the Mol file to indicate whether a set of coordinates is 2D or 3D. Here are two examples: In [22]: print(Chem.MolToMolBlock(m,confId=0)) RDKit 2D 5 4 0 0 0 0 0 0 0 0999 V2000 1.5000 -0.0. F 0 0 0 0 0 0 0 0 0 0 0 0 0. -0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 -1.50000.0. Cl 0 0 0 0 0 0 0 0 0 0 0 0 0.1.50000. Br 0 0 0 0 0 0 0 0 0 0 0 0 -0. -1.50000. H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 3 1 0 2 4 1 0 2 5 1 0 M END In [23]: print(Chem.MolToMolBlock(m,confId=1)) RDKit 3D 5 4 0 0 0 0 0 0 0 0999 V2000 -0.16051.2383 -0.7086 F 0 0 0 0 0 0 0 0 0 0 0 0 -0.04760.11200.0663 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.3542 -1.0307 -0.3007 Cl 0 0 0 0 0 0 0 0 0 0 0 0 1.6975 -0.6868 -0.2044 Br 0 0 0 0 0 0 0 0 0 0 0 0 -0.13520.36721.1474 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 3 1 0 2 4 1 0 2 5 1 0 M END The RDKit assumes that the conformations it reads are 3D unless that flag is set to "2D". > > Second question, > > When I read in a file with a 3D conformer, and then later use > compute2DCoords, followed by WedgeMolBonds, it adds wedges to non-chiral > atoms. Is this by design? > No, this definitely should not happen. If it does, it's a bug. I'm guessing that's what you're seeing is the wedges that were originally present in your mol file (at least in the example you provide below) > It definitely does serve to convey 3D information from the file in the > depiction, but I'd also like to know how to disable it if possible. Would > running assignStereochemistry fix the issue. > Yes, if you ensure that the "cleanIt" argument is set. The fact that this isn't happening for you indicates that you are reading the molecules in without sanitizing them - the mol file parser calls assignStereochemistry() by default if you sanitize. Are you sure that you should be disabling sanitization? -greg > > The mol file for the second question is pasted below, and here is the > generated depiction, > > [image: Inline image 2] > > > aspirin.mol > > 21 21 0 0 0 >-2.2240 -1.4442 -0.4577 C 0 0 0 0 0 >-2.1657 -0.0545 -0.5349 C 0 0 0 0 0 >-0.99160.6085 -0.1694 C 0 0 0 0 0 > 0.1471 -0.07380.2764 C 0 0 0 0 0 > 0.0751 -1.48320.3390 C 0 0 0 0 0 >-1.1052 -2.1532 -0.0188 C 0 0 0 0 0 > 1.2412 -2.29340.7925 C 0 0 0 0 0 > 2.4223 -1.76191.1727 O 0 0 0 0 0 > 1.1650 -3.51620.8364 O 0 0 0 0 0 > 1.27950.62330.5954 O 0 0 0 0 0 > 1.10051.75771.3258 C 0 0 0 0 0 > 2.44292.36351.6825 C 0 0 0 0 0 > 0.02552.20411.6578 O 0 0 0 0 0 >-3.1430 -1.9775 -0.7500 H 0 0 0 0 0 >-3.03820.5167 -0.8915 H 0 0 0 0 0 >-0.96081.7083 -0.2479 H 0 0 0 0 0 >-1.1740 -3.25200.0315 H 0 0 0 0 0 > 2.9869 -2.51321.4166 H 0 0 0 0 0 > 2.31423.39672.0773 H 0 0 0 0 0 > 3.10512.41410.7884 H 0 0 0 0 0 > 2.93911.74592.4657 H 0 0 0 0 0 > 1 2 2 0 0 0 > 1 6 1 0 0 0 > 1 14 1 0 0 0 > 2 3 1 0 0 0 > 2 15 1 0 0 0 > 3 4 2 0 0 0 > 3 16 1 0 0 0 > 4 5 1 0 0 0 > 4 10 1 0 0 0 > 5 6 2 0 0 0 > 5 7 1 0 0 0 > 6 17 1 0 0 0 > 7 8 1 0 0 0 > 7 9 2 0 0 0 > 8 18 1 0 0 0 > 10 11 1 1 0 0 > 11 12 1 0 0 0 > 11 13 2 0 0 0 > 12 19 1 0 0 0 > 12 20 1 6 0 0 > 12 21 1 1 0 0 > M END > > > Thanks, > > Jason > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech
[Rdkit-discuss] mol file parsing, 3D or 2D
Two question about mol file conformer reading: Looking through the .mol files included for testing, and chose "Code/GraphMol/Depictor/test_data/7UPJ_spread.mol" at random. When I read in this file using the RDKit::MolFileToMol function, and then query its conformer's is3D() method, it returns true even though it is definitely a 2D depiction in the file. I'm not totally familiar with the MDL file specifications, so is there some flag I'm missing in the file? Second question, When I read in a file with a 3D conformer, and then later use compute2DCoords, followed by WedgeMolBonds, it adds wedges to non-chiral atoms. Is this by design? It definitely does serve to convey 3D information from the file in the depiction, but I'd also like to know how to disable it if possible. Would running assignStereochemistry fix the issue. The mol file for the second question is pasted below, and here is the generated depiction, [image: Inline image 2] aspirin.mol 21 21 0 0 0 -2.2240 -1.4442 -0.4577 C 0 0 0 0 0 -2.1657 -0.0545 -0.5349 C 0 0 0 0 0 -0.99160.6085 -0.1694 C 0 0 0 0 0 0.1471 -0.07380.2764 C 0 0 0 0 0 0.0751 -1.48320.3390 C 0 0 0 0 0 -1.1052 -2.1532 -0.0188 C 0 0 0 0 0 1.2412 -2.29340.7925 C 0 0 0 0 0 2.4223 -1.76191.1727 O 0 0 0 0 0 1.1650 -3.51620.8364 O 0 0 0 0 0 1.27950.62330.5954 O 0 0 0 0 0 1.10051.75771.3258 C 0 0 0 0 0 2.44292.36351.6825 C 0 0 0 0 0 0.02552.20411.6578 O 0 0 0 0 0 -3.1430 -1.9775 -0.7500 H 0 0 0 0 0 -3.03820.5167 -0.8915 H 0 0 0 0 0 -0.96081.7083 -0.2479 H 0 0 0 0 0 -1.1740 -3.25200.0315 H 0 0 0 0 0 2.9869 -2.51321.4166 H 0 0 0 0 0 2.31423.39672.0773 H 0 0 0 0 0 3.10512.41410.7884 H 0 0 0 0 0 2.93911.74592.4657 H 0 0 0 0 0 1 2 2 0 0 0 1 6 1 0 0 0 1 14 1 0 0 0 2 3 1 0 0 0 2 15 1 0 0 0 3 4 2 0 0 0 3 16 1 0 0 0 4 5 1 0 0 0 4 10 1 0 0 0 5 6 2 0 0 0 5 7 1 0 0 0 6 17 1 0 0 0 7 8 1 0 0 0 7 9 2 0 0 0 8 18 1 0 0 0 10 11 1 1 0 0 11 12 1 0 0 0 11 13 2 0 0 0 12 19 1 0 0 0 12 20 1 6 0 0 12 21 1 1 0 0 M END Thanks, Jason -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss