[Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-14 Thread Greg Landrum
Dear all,

We've been invited again to participate in the OpenChemistry application
for Google Summer of Code.

In order to participate we need ideas for projects and mentors to go along
with them.

The current list of RDKit ideas is being maintained here:
http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas

(Note: at the point that I'm pressing "send", that's still a copy of last
year's project ideas).

If you're willing to be a mentor (please ask me about the ~5 hours/week
required here) or have ideas, please reply to this thread.

Best,
-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Behavior of ETKDG / EmbedMultipleConfs

2018-01-14 Thread Sereina
Hi Andy,

If -1 is used for the random number seed, the RDKit will use the current date 
(including seconds) as seed (Greg, please correct me if I’m wrong). Therefore, 
you get a different seed every time you run the script. If you use a fixed 
seed, you will generate the same conformations every time you run it. Note that 
if pruneRMSthresh > 0, the generated conformers will be pruned, i.e. conformers 
with a RMS < cutoff to any previous conformer will be discarded. As this 
happens at the very end of the conformer generation routine, no additional 
conformers will be generated to replace the discarded ones. This is why you get 
a varying number of conformers. 

I have run your script and I get the same weird third conformation. This should 
certainly not happen. I will look into it.

Best,
Sereina


> On 12 Jan 2018, at 19:17, Andy Jennings  wrote:
> 
> Hi RDKitters,
> 
> Whilst looking at generating some conformations of molecules using the ETKDG 
> method with EmbedMultipleConfs I've come across some strange (to me) behavior.
> 
> When I generate conformations of some molecules with the randomSeed as -1 the 
> result is a variable number of conformations. That's not the strangest aspect 
> though - some of the conformations are quite bizarre based upon any geometry 
> rules I can think of. However, when the randomSeed is set to a fixed number 
> the odd behavior goes away and I get only reasonable conformations.
> 
> To illustrate here is some code (please no criticism of my terrible style!):
> 
> ### CODE ###
> from rdkit import Chem
> from rdkit.Chem import AllChem
> import sys
> 
> acamide = Chem.MolFromSmiles('O=C(NC=C)c1c1')
> ETKDG = 1
> _seed = -1
> m = Chem.AddHs(acamide)
> n = 3
> ps = AllChem.ETKDG()
> ps.pruneRmsThresh = 0.5
> ps.numThreads = 0
> ps.randomSeed = _seed
> fixIt = 0
> for i in range(0,100):
> ids = AllChem.EmbedMultipleConfs(m, n, ps)
> if fixIt:
> for _id in ids: AllChem.UFFOptimizeMolecule(m, confId = _id)
> sys.stderr.write('%d,' % len(ids))
> if len(ids) > 2:
> outStream = Chem.SDWriter('test.sdf')
> for _id in ids:
> outStream.write(m,confId = _id)
> outStream.flush()
> outStream.close()
> sys.stderr.write('\n')
> break
> 
> ### END CODE ###
> 
> 
> This takes the smiles string for a simple acrylamide and generates a max of 3 
> conformations for the molecule. The loop runs 100 times and halts when 3 
> conformations are found - which is the sign of a bad conformation being 
> generated. When I run this the number of conformations generated each time 
> varies between 1-3 and it does so differently from run to run.
> 
> For instance:
> run #1: 
> 2,2,1,1,2,2,2,2,2,2,1,2,2,1,2,1,2,1,2,2,1,2,1,1,1,2,2,2,2,2,1,2,2,2,2,2,2,2,1,2,2,1,2,2,2,2,1,1,2,2,3,
> run #2: 2,1,2,2,2,1,1,3,
> run #3: 2,2,2,1,2,2,2,2,1,2,2,1,2,1,2,2,3,
> and so on
> 
> When I visually inspect test.sdf that results from a generation of 3 
> conformers I find that one of the conformations has a very odd amide nitrogen 
> geometry - almost linear between the heavy atoms.
> 
> If I change _seed to a number such as '1' I get a single conformation for 
> every run.
> 
> If I implement the UFF optimization (with fixIt = 1) then I'll still get 
> multiple conformations but they all look reasonable.
> 
> So, I'm not sure if there is some systematic problem here or I'm just failing 
> to understand the appropriate way to implement this form of conformational 
> search. Any insights are welcome.
> 
> Best,
> Andy
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! 
> http://sdm.link/slashdot___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Explicit hydrogens in substructure search

2018-01-14 Thread Greg Landrum
On Thu, Jan 11, 2018 at 7:23 PM, Andrey  wrote:

>
> I managed to get it working for Python wrapper. Could you please give me
> an idea how to implement it for Postgres cartridge?
>
>
I don't understand the question. The blog post I pointed you to earlier in
the thread:
http://rdkit.blogspot.ch/2016/07/tuning-substructure-queries-ii.html
focuses on using this functionality with the cartridge.

Did that not work for you or are you looking for something different?


> Kind regards,
>
> Andrew
>
>
>
> 13.12.2017 08:58, Greg Landrum 
> >On Tue, Dec 12, 2017 at 7:28 PM, Andrey  wrote:
> >
> > >
> > > Does this depend on removeHs() function? I mean, to make MergeQueryHs()
> > > work, should I do removeHs=False first for all compounds in my
> database, to
> > > preserve implicit\explicit hydrogens in their structure?
> > >
> >
> > The MergeQueryHs() functionality is primarily intended to be used for
> > molecules where the Hs have been removed.
> >
> > -greg
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -- реклама ---
> Программа для автоматизации бизнеса для ленивых эгоистов.
> CRM OneBox - https://goo.gl/TDv2xT
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mol file parsing, 3D or 2D

2018-01-14 Thread Greg Landrum
Hi Jason,

On Sun, Jan 14, 2018 at 8:23 PM, Jason Biggs  wrote:

> Two question about mol file conformer reading:
>
> Looking through the .mol files included for testing, and chose
> "Code/GraphMol/Depictor/test_data/7UPJ_spread.mol" at random.
>
> When I read in this file using the RDKit::MolFileToMol function, and then
> query its conformer's is3D() method, it returns true even though it is
> definitely a 2D depiction in the file.  I'm not totally familiar with the
> MDL file specifications, so is there some flag I'm missing in the file?
>

There is an optional flag that can be present on the second line of the Mol
file to indicate whether a set of coordinates is 2D or 3D.
Here are two examples:

In [22]: print(Chem.MolToMolBlock(m,confId=0))

 RDKit  2D

  5  4  0  0  0  0  0  0  0  0999 V2000
1.5000   -0.0. F   0  0  0  0  0  0  0  0  0  0  0  0
0.   -0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.50000.0. Cl  0  0  0  0  0  0  0  0  0  0  0  0
0.1.50000. Br  0  0  0  0  0  0  0  0  0  0  0  0
   -0.   -1.50000. H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  2  4  1  0
  2  5  1  0
M  END


In [23]: print(Chem.MolToMolBlock(m,confId=1))

 RDKit  3D

  5  4  0  0  0  0  0  0  0  0999 V2000
   -0.16051.2383   -0.7086 F   0  0  0  0  0  0  0  0  0  0  0  0
   -0.04760.11200.0663 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3542   -1.0307   -0.3007 Cl  0  0  0  0  0  0  0  0  0  0  0  0
1.6975   -0.6868   -0.2044 Br  0  0  0  0  0  0  0  0  0  0  0  0
   -0.13520.36721.1474 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  2  4  1  0
  2  5  1  0
M  END


The RDKit assumes that the conformations it reads are 3D unless that flag
is set to "2D".


>
> Second question,
>
> When I read in a file with a 3D conformer, and then later use
> compute2DCoords, followed by WedgeMolBonds, it adds wedges to non-chiral
> atoms.  Is this by design?
>

No, this definitely should not happen. If it does, it's a bug.

I'm guessing that's what you're seeing is the wedges that were originally
present in your mol file (at least in the example you provide below)


> It definitely does serve to convey 3D information from the file in the
> depiction, but I'd also like to know how to disable it if possible.  Would
> running assignStereochemistry fix the issue.
>

Yes, if you ensure that the "cleanIt" argument is set.

The fact that this isn't happening for you indicates that you are reading
the molecules in without sanitizing them - the mol file parser calls
assignStereochemistry() by default if you sanitize. Are you sure that you
should be disabling sanitization?

-greg



>
> The mol file for the second question is pasted below, and here is the
> generated depiction,
>
> [image: Inline image 2]
>
>
> aspirin.mol
>
>  21 21  0  0  0
>-2.2240   -1.4442   -0.4577 C   0  0  0  0  0
>-2.1657   -0.0545   -0.5349 C   0  0  0  0  0
>-0.99160.6085   -0.1694 C   0  0  0  0  0
> 0.1471   -0.07380.2764 C   0  0  0  0  0
> 0.0751   -1.48320.3390 C   0  0  0  0  0
>-1.1052   -2.1532   -0.0188 C   0  0  0  0  0
> 1.2412   -2.29340.7925 C   0  0  0  0  0
> 2.4223   -1.76191.1727 O   0  0  0  0  0
> 1.1650   -3.51620.8364 O   0  0  0  0  0
> 1.27950.62330.5954 O   0  0  0  0  0
> 1.10051.75771.3258 C   0  0  0  0  0
> 2.44292.36351.6825 C   0  0  0  0  0
> 0.02552.20411.6578 O   0  0  0  0  0
>-3.1430   -1.9775   -0.7500 H   0  0  0  0  0
>-3.03820.5167   -0.8915 H   0  0  0  0  0
>-0.96081.7083   -0.2479 H   0  0  0  0  0
>-1.1740   -3.25200.0315 H   0  0  0  0  0
> 2.9869   -2.51321.4166 H   0  0  0  0  0
> 2.31423.39672.0773 H   0  0  0  0  0
> 3.10512.41410.7884 H   0  0  0  0  0
> 2.93911.74592.4657 H   0  0  0  0  0
>   1  2  2  0  0  0
>   1  6  1  0  0  0
>   1 14  1  0  0  0
>   2  3  1  0  0  0
>   2 15  1  0  0  0
>   3  4  2  0  0  0
>   3 16  1  0  0  0
>   4  5  1  0  0  0
>   4 10  1  0  0  0
>   5  6  2  0  0  0
>   5  7  1  0  0  0
>   6 17  1  0  0  0
>   7  8  1  0  0  0
>   7  9  2  0  0  0
>   8 18  1  0  0  0
>  10 11  1  1  0  0
>  11 12  1  0  0  0
>  11 13  2  0  0  0
>  12 19  1  0  0  0
>  12 20  1  6  0  0
>  12 21  1  1  0  0
> M  END
>
>
> Thanks,
>
> Jason
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech 

[Rdkit-discuss] mol file parsing, 3D or 2D

2018-01-14 Thread Jason Biggs
Two question about mol file conformer reading:

Looking through the .mol files included for testing, and chose
"Code/GraphMol/Depictor/test_data/7UPJ_spread.mol" at random.

When I read in this file using the RDKit::MolFileToMol function, and then
query its conformer's is3D() method, it returns true even though it is
definitely a 2D depiction in the file.  I'm not totally familiar with the
MDL file specifications, so is there some flag I'm missing in the file?

Second question,

When I read in a file with a 3D conformer, and then later use
compute2DCoords, followed by WedgeMolBonds, it adds wedges to non-chiral
atoms.  Is this by design?  It definitely does serve to convey 3D
information from the file in the depiction, but I'd also like to know how
to disable it if possible.  Would running assignStereochemistry fix the
issue.

The mol file for the second question is pasted below, and here is the
generated depiction,

[image: Inline image 2]


aspirin.mol

 21 21  0  0  0
   -2.2240   -1.4442   -0.4577 C   0  0  0  0  0
   -2.1657   -0.0545   -0.5349 C   0  0  0  0  0
   -0.99160.6085   -0.1694 C   0  0  0  0  0
0.1471   -0.07380.2764 C   0  0  0  0  0
0.0751   -1.48320.3390 C   0  0  0  0  0
   -1.1052   -2.1532   -0.0188 C   0  0  0  0  0
1.2412   -2.29340.7925 C   0  0  0  0  0
2.4223   -1.76191.1727 O   0  0  0  0  0
1.1650   -3.51620.8364 O   0  0  0  0  0
1.27950.62330.5954 O   0  0  0  0  0
1.10051.75771.3258 C   0  0  0  0  0
2.44292.36351.6825 C   0  0  0  0  0
0.02552.20411.6578 O   0  0  0  0  0
   -3.1430   -1.9775   -0.7500 H   0  0  0  0  0
   -3.03820.5167   -0.8915 H   0  0  0  0  0
   -0.96081.7083   -0.2479 H   0  0  0  0  0
   -1.1740   -3.25200.0315 H   0  0  0  0  0
2.9869   -2.51321.4166 H   0  0  0  0  0
2.31423.39672.0773 H   0  0  0  0  0
3.10512.41410.7884 H   0  0  0  0  0
2.93911.74592.4657 H   0  0  0  0  0
  1  2  2  0  0  0
  1  6  1  0  0  0
  1 14  1  0  0  0
  2  3  1  0  0  0
  2 15  1  0  0  0
  3  4  2  0  0  0
  3 16  1  0  0  0
  4  5  1  0  0  0
  4 10  1  0  0  0
  5  6  2  0  0  0
  5  7  1  0  0  0
  6 17  1  0  0  0
  7  8  1  0  0  0
  7  9  2  0  0  0
  8 18  1  0  0  0
 10 11  1  1  0  0
 11 12  1  0  0  0
 11 13  2  0  0  0
 12 19  1  0  0  0
 12 20  1  6  0  0
 12 21  1  1  0  0
M  END


Thanks,

Jason
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss