Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-07 Thread Michal Krompiec
On 6 April 2014 05:36, Greg Landrum greg.land...@gmail.com wrote:
 Some substituted oligoarenes with at least 8 rings in the chain, not
 particularly fancy (I think the problem is related more to the length
 of the molecule than to the nature of the repeat units). I tried
 various options in the EmbedMolecule function, but without success.
 This error occured in less than 10% tested structures. If anyone is
 interested in correcting this, I think I can produce a
 non-confidential input example...


 I would certainly be interested to see this. I'm not sure what can be done,
 but it's interesting to have the examples.

Try this one with random coordinate generation:

Cc1cc(cc3c1c2ccc(cc2C3(C)C)c4ccc(c(C)c4C)c5ccc(s5)c7ccc8c6ccc(cc6C(C)(C)c8c7)c%14ccc(c9ccc(s9)c%10cc%12c(cc%10CC)c%11c%11C%12(C)C)c%13cc(C)ccc%13%14)c%15ccc(s%15)c%17ccc(c%16c%16%17)c%18cc%20c(cc%18)c%19c(C)c(C)c(cc%19C%20(C)C)c%21sc(cc%21C)c%23ccc%24c%22ccc(cc%22C(C)(C)c%24c%23)c%25ccc(s%25)c%31ccc(c%27ccc%28c%26c(C)cc(cc%26C(C)(C)c%28c%27)c%29ccc(s%29)c%30cccs%30)c(C)c%31C

AllChem.EmbedMolecule(mol,useRandomCoords=True);
AllChem.MMFFOptimizeMolecule(mol,maxIters=100)

I have just run it 3 times and each time it produced a knot, which
cannot be disentangled by optimization. This example is completely
artificial, but I got similar results in a few % of real cases. It
is not an issue for me, actually, as I now use Corina to get the
starting conformations and then optimize them with MMFF in RDKit.

 and KNIME.
 Which conformation generator in knime?
None, I was using knime just to browse 2D structures.

Best wishes,
Michal

--
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test  Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees_APR
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread Michał Nowotka
Hi,

I've found this (http://code.google.com/p/rdkit/wiki/Generating3DCoordinates)
wiki page suggesting how to compute 3D coordinates:

from rdkit import Chem
from rdkit.Chem import AllChem
m = Chem.MolFromSmiles('c1c1C(=O)O')
AllChem.EmbedMolecule(m)
# the molecule now has a crude conformation, clean it up:
AllChem.UFFOptimizeMolecule(m)

On the other hand, Getting started document describes this differently:

AllChem.EmbedMolecule(m2)AllChem.UFFOptimizeMolecule(m2)

In the meantime, someone suggested that I should call:

Chem.AddHs(m)

Before calculating 3D properties.

So what is an ultimate way of doing this? Lets assume I already have
rdkit molecule:

m = Chem.MolFromSmiles('Cc1c1')

or:

m = Chem.MolFromMolFile('data/input.mol')

what should I do with 'm' to compute 3D coordinates?

Also, once we have MMFF implemented in rdkit, is there any benefit of
using UFF (apart from maybe backwards compatibility, as this is a new
feature)?
Is UFF significantly faster then MMFF?

Kind regards,

Michał Nowotka
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread JP
I don't know about the ultimate way: but this works for me (to generate n
conformers):

writer = Chem.SDWriter('some_file.sdf')
# add Hydrogens
molH = Chem.AddHs(mol)
# create n conformers for molecule
confIds = AllChem.EmbedMultipleConfs(molH, n)
# E optimize
for confId in confIds:
AllChem.UFFOptimizeMolecule(molH, confId=confId)
# write to output file
writer.write(molH, confId=confId)

You should replace the EmbedMultipleConfs with EmbedMolecule if you are
only interested in generating only one conformer.  UFFOptimizeMolecule(...)
returns an integer, which if 0 tells you the optimization has converged (or
1 otherwise).

UFF is significantly faster, and I do not think the results are worse of
than the ones generated for MMFF.  At least for the small molecules I was
looking at, but I am sure there are exceptions to this.  Paolo has done a
lot of excellent work on the forcefields, and I think the amide and
carbonyl planarity issues for UFF have now been fixed.






-
Jean-Paul Ebejer
Early Stage Researcher


On 5 April 2014 13:35, Michał Nowotka mmm...@gmail.com wrote:

 Hi,

 I've found this (
 http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page
 suggesting how to compute 3D coordinates:

 from rdkit import Chem
 from rdkit.Chem import AllChem


 m = Chem.MolFromSmiles('c1c1C(=O)O')
 AllChem.EmbedMolecule(m)
 # the molecule now has a crude conformation, clean it up:
 AllChem.UFFOptimizeMolecule(m)

 On the other hand, Getting started document describes this differently:


 AllChem.EmbedMolecule(m2)AllChem.UFFOptimizeMolecule(m2)

 In the meantime, someone suggested that I should call:

 Chem.AddHs(m)

 Before calculating 3D properties.

 So what is an ultimate way of doing this? Lets assume I already have rdkit 
 molecule:

 m = Chem.MolFromSmiles('Cc1c1')


 or:

 m = Chem.MolFromMolFile('data/input.mol')

 what should I do with 'm' to compute 3D coordinates?

 Also, once we have MMFF implemented in rdkit, is there any benefit of using 
 UFF (apart from maybe backwards compatibility, as this is a new feature)?


 Is UFF significantly faster then MMFF?

 Kind regards,

 Michał Nowotka




 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread Michal Krompiec
Michal: from my experience, MMFF in rdkit is slower than UFF (ca. 2x
for my test cases) but converges faster, so in certain cases the
overall execution time (embedding+optimization) won't be much shorter
for UFF. It really depends on what molecules you work on. AFAIK
rdkit's 3d coord generation algorithm was designed for small- to
medium-sized druglike molecules, so you may expect it to fail in
areas very far from this territory. For example, it does not work well
for long conjugated oligomers - sometimes it produces molecular knots
instead of straight strands, and is quite slow for large systems.
That's why I switched to CORINA, btw.
Best wishes,
Michal Krompiec


On 5 April 2014 18:05, JP jeanpaul.ebe...@inhibox.com wrote:
 I don't know about the ultimate way: but this works for me (to generate n
 conformers):

 writer = Chem.SDWriter('some_file.sdf')
 # add Hydrogens
 molH = Chem.AddHs(mol)
 # create n conformers for molecule
 confIds = AllChem.EmbedMultipleConfs(molH, n)
 # E optimize
 for confId in confIds:
 AllChem.UFFOptimizeMolecule(molH, confId=confId)
 # write to output file
 writer.write(molH, confId=confId)

 You should replace the EmbedMultipleConfs with EmbedMolecule if you are only
 interested in generating only one conformer.  UFFOptimizeMolecule(...)
 returns an integer, which if 0 tells you the optimization has converged (or
 1 otherwise).

 UFF is significantly faster, and I do not think the results are worse of
 than the ones generated for MMFF.  At least for the small molecules I was
 looking at, but I am sure there are exceptions to this.  Paolo has done a
 lot of excellent work on the forcefields, and I think the amide and carbonyl
 planarity issues for UFF have now been fixed.






 -
 Jean-Paul Ebejer
 Early Stage Researcher


 On 5 April 2014 13:35, Michał Nowotka mmm...@gmail.com wrote:

 Hi,

 I've found this
 (http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page
 suggesting how to compute 3D coordinates:

 from rdkit import Chem
 from rdkit.Chem import AllChem



 m = Chem.MolFromSmiles('c1c1C(=O)O')

 AllChem.EmbedMolecule(m)
 # the molecule now has a crude conformation, clean it up:

 AllChem.UFFOptimizeMolecule(m)

 On the other hand, Getting started document describes this differently:




 AllChem.EmbedMolecule(m2)
 AllChem.UFFOptimizeMolecule(m2)

 In the meantime, someone suggested that I should call:


 Chem.AddHs(m)

 Before calculating 3D properties.


 So what is an ultimate way of doing this? Lets assume I already have rdkit
 molecule:

 m = Chem.MolFromSmiles('Cc1c1')




 or:

 m = Chem.MolFromMolFile('data/input.mol')


 what should I do with 'm' to compute 3D coordinates?

 Also, once we have MMFF implemented in rdkit, is there any benefit of
 using UFF (apart from maybe backwards compatibility, as this is a new
 feature)?



 Is UFF significantly faster then MMFF?

 Kind regards,

 Michał Nowotka




 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread Paul Emsley
On 05/04/14 19:04, Michal Krompiec wrote:


 For example, it does not work well
 for long conjugated oligomers - sometimes it produces molecular knots
 instead of straight strands, and is quite slow for large systems.

Can you expand on that? What sort of long conjugated oligomers were you 
looking at? What was the nature of the input from which you were making 
rdkit molecules?


Paul.


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread Michal Krompiec
On 5 April 2014 19:11, Paul Emsley pems...@mrc-lmb.cam.ac.uk wrote:
 On 05/04/14 19:04, Michal Krompiec wrote:


 For example, it does not work well
 for long conjugated oligomers - sometimes it produces molecular knots
 instead of straight strands, and is quite slow for large systems.

 Can you expand on that? What sort of long conjugated oligomers were you
 looking at?

Some substituted oligoarenes with at least 8 rings in the chain, not
particularly fancy (I think the problem is related more to the length
of the molecule than to the nature of the repeat units). I tried
various options in the EmbedMolecule function, but without success.
This error occured in less than 10% tested structures. If anyone is
interested in correcting this, I think I can produce a
non-confidential input example...

 What was the nature of the input from which you were making
 rdkit molecules?

SMILES. The same input worked 100% fine with CORINA (which was, btw,
approx. 5-20x faster on the same computer) and KNIME.

Regards,
Michal

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread Greg Landrum
Hi Michal,

JP gave a good answer already, I'll just add a few things.

First: thanks for pointing out the missing call to AddHs in the
documentation. I've fixed that.

On Sat, Apr 5, 2014 at 1:35 PM, Michał Nowotka mmm...@gmail.com wrote:

 Hi,

 I've found this (
 http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page
 suggesting how to compute 3D coordinates:

 from rdkit import Chem
 from rdkit.Chem import AllChem


 m = Chem.MolFromSmiles('c1c1C(=O)O')
 AllChem.EmbedMolecule(m)
 # the molecule now has a crude conformation, clean it up:
 AllChem.UFFOptimizeMolecule(m)

 On the other hand, Getting started document describes this differently:


 AllChem.EmbedMolecule(m2)AllChem.UFFOptimizeMolecule(m2)

 Those are the same, right?

In the meantime, someone suggested that I should call:

 Chem.AddHs(m)

 Before calculating 3D properties.

 So what is an ultimate way of doing this? Lets assume I already have rdkit 
 molecule:

 m = Chem.MolFromSmiles('Cc1c1')


 or:

 m = Chem.MolFromMolFile('data/input.mol')

 what should I do with 'm' to compute 3D coordinates?


JP's answer was good. If you want a single 3D conformation you should
AddHs, Embed, and Optimize. If you don't want the Hs in the final molecule,
you can RemoveHs after the optimization.


 Also, once we have MMFF implemented in rdkit, is there any benefit of using 
 UFF (apart from maybe backwards compatibility, as this is a new feature)?


 Is UFF significantly faster then MMFF?

 MMFF tends to generate better geometries (for some definition of
better), UFF tends to be faster and will work for almost any molecule.
There are many molecules where MMFF parameters are missing and you will
have to use UFF.

-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread Greg Landrum
On Sat, Apr 5, 2014 at 8:44 PM, Michal Krompiec
michal.kromp...@gmail.comwrote:

 On 5 April 2014 19:11, Paul Emsley pems...@mrc-lmb.cam.ac.uk wrote:
  On 05/04/14 19:04, Michal Krompiec wrote:
 
 
  For example, it does not work well
  for long conjugated oligomers - sometimes it produces molecular knots
  instead of straight strands, and is quite slow for large systems.
 
  Can you expand on that? What sort of long conjugated oligomers were you
  looking at?

 Some substituted oligoarenes with at least 8 rings in the chain, not
 particularly fancy (I think the problem is related more to the length
 of the molecule than to the nature of the repeat units). I tried
 various options in the EmbedMolecule function, but without success.
 This error occured in less than 10% tested structures. If anyone is
 interested in correcting this, I think I can produce a
 non-confidential input example...


I would certainly be interested to see this. I'm not sure what can be done,
but it's interesting to have the examples.

I played around a bit with a very simple example and was able to get
reasonable rod-like conformers:

In [43]: m =
Chem.MolFromSmiles('c12c1.'+'c12ccc3cc1.c13ccc2cc1.'*6+'c12c1')

In [44]: mh= Chem.AddHs(m)

In [45]: AllChem.EmbedMolecule(mh)
Out[45]: 0

In [46]: AllChem.UFFOptimizeMolecule(mh,maxIters=1000)
Out[46]: 0

Note that the return value of both EmbedMolecule and UFFOptimizeMolecule is
important: if EmbedMolecule returns -1 it means the embedding failed (more
on this below) and if UFFOptimizeMolecule returns anything other than 0 it
means that the optimization did not converge and that more iterations are
needed (you can just call it again).

In the simple tests I just did, UFF did occasionally produce geometries
that were not rod like. MMFF was always able to give a properly extended
geometry.

If EmbedMolecule fails, you can always try it again (there's a random
process involved, so running it again gives different results) or you can
try setting the useRandomCoords argument to true. This uses a different
approach to generate the coordinates and often works better for large
molecules. There were a couple of threads on this topic back in 2009;
here's one of the messages to help find the rest:
http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg00481.html

The general problem with this kind of molecule and the distance-geometry
based approach is that the code doesn't have enough information to know
how far apart atoms in a big molecule should be. This means that the
forcefield (UFF or MMFF) really has a lot of work to do in order to clean
the geometries up. In playing around with some of these simple systems, it
seems like MMFF is able to do this more reliably than UFF.



  What was the nature of the input from which you were making
  rdkit molecules?

 SMILES. The same input worked 100% fine with CORINA (which was, btw,
 approx. 5-20x faster on the same computer)


This kind of thing: generate a single realistic conformation of a molecule,
is what Corina is for; it's a nice piece of software and I'm really not
surprised that, particularly for these large molecules, that it's
significantly faster than the RDKit.


 and KNIME.


Which conformation generator in knime?

-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss