Re: [Rdkit-discuss] Output from UFFOptimizeMolecules(m)??

Greg Landrum Mon, 09 Mar 2009 05:37:06 +0000

Dear Marshall,

On Sat, Mar 7, 2009 at 5:26 AM, Marshall Levesque
<marsh...@emolecules.com> wrote:
>
> Your suggestion fixed my problems with the file I was working with.  I am a
> novice with both to RDKit and Python.


Glad to hear that the script fixed at least part of the problems.

>
> I have a new question for you:  I am using the model script you provided,
> but with a different set of compounds.

I will reorder your remaining two questions since the first has an
easier (and more fundamental) answer.

> AND THEN, I tried to stop using the UFFOptimizeMolecule function and use
> only EmbedMolecule and I received this error:
>
> ############
> embed: 613
> embed: 614
> embed: 615
> embed: 616
> embed: 617
> embed: 618
> [20:03:38] Explicit valence for atom # 0 N greater than permitted
> [20:03:38] Unexpected error hit on line 38532
> [20:03:38] ERROR: moving to the begining of the next molecule
> embed: 619
> Traceback (most recent call last):
>  File "2-3D.py", line 10, in <module>
>    AllChem.EmbedMolecule(mol)
> Boost.Python.ArgumentError: Python argument types in
>    Chem.rdDistGeom.EmbedMolecule(NoneType)
> did not match C++ signature:
>    EmbedMolecule(RDKit::ROMol {lvalue} mol, unsigned int maxAttempts=0, int
> randomSeed=-1, bool clearConfs=True, bool useRandomCoords=False, double
> boxSizeMult=2.0, bool randNegEig=True, unsigned int numZeroFail=1,
> boost::python::dict {lvalue} coordMap={})

This type of error message, the Boost.Python.ArgumentError where you have:

> Boost.Python.ArgumentError: Python argument types in
>    SOME_FUNCTION(NoneType)
> did not match C++ signature:
>    SOME_FUNCTION(RDKit::ROMol {lvalue} mol, OTHER_ARGUMENTS)

Is the RDKit/Python way of telling you that you have a molecule that
the RDKit either failed to process or rejected.

> I think this is the #618 and #619 compound(s) that caused the problem:
> http://www.emolecules.com/cgi-bin/more?vid=5753883
> http://www.emolecules.com/cgi-bin/more?vid=15920212

The first molecule looks ok to me (and I can read it in from the
SMILES given on your web page). The second has a neutral
four-coordinate nitrogen, so it's going to be rejected by the RDKit,
which is very picky about valence.

> If there are valence problems, I can just record those errors and throw out
> those molecules.  But I do not understand why the FileParseException does
> not succeed in proceeding to the next molecule.  I am assuming it has to do
> with the way the for-loop is constructed, but I wasn't able to figure it
> out.

The way to skip molecules the RDKit considers "bad" is to know that
the molecule processing machinery returns the special value None when
it encounters a molecule it's unhappy with. You can test for this in
your for loop:

####
for i,mol in enumerate(supplier):
  if mol is None:
    # record or report error, supplier.GetItemText(i) might be useful
    continue
  # do whatever else you want to do with the molecule
  mols.append(mol)
####

I included two placeholders here for your code. The important part is
the test to see if the molecule is None and the continue, which causes
the rest of the loop body to be skipped for that molecule.

>
> ##### SCRIPT to embed and optimize  #########
> import Chem
> from Chem import AllChem
> suppl = Chem.SDMolSupplier('usdrug.700.sdf')
>
> mols = []
> for i,mol in enumerate(suppl):
>       num = mol.GetNumAtoms()
>        print 'embed:',i
>        AllChem.EmbedMolecule(mol)
>       print 'optimize:',i
>       AllChem.UFFOptimizeMolecule(mol,500)
>        mols.append(mol)
>
> w = Chem.SDWriter('output.sdf')
> for mol in mols: w.write(mol)
>
> ##########################
>
> ###### OUTPUT error ########
> embed: 529
> optimize: 529
> embed: 530
> optimize: 530
> embed: 531
> optimize: 531
> embed: 532
> optimize: 532
> Traceback (most recent call last):
>  File "2-3D.py", line 12, in <module>
>    AllChem.UFFOptimizeMolecule(mol,500)
> ValueError: Bad Conformer Id
>
> ###############################
>
> I think I narrowed the molecule down to either of these two (with the first
> looking more likely to be the problem):
> http://www.emolecules.com/cgi-bin/more?vid=5852064
> http://www.emolecules.com/cgi-bin/more?vid=1985607

What is happening here is that the call to EmbedMolecule is failing to
generate a conformation. This can happen with very large, flexible
molecules like your two examples. The way you check this is to test
the return value from EmbedMolecule(), which returns -1 if it fails to
generate a conformation.

In cases where the standard embedding algorithm fails (returns -1),
you can try calling EmbedMolecule again with the useRandomCoords
argument set to True:

####
[11]>>> m = 
Chem.MolFromSmiles('O=C(o...@h]1o[c@H](COC(=O)c2cc(O)c(O)c(OC(=O)c3cc(O)c(O)c(O)c3)c2)[C@@H](OC(=O)c2cc(O)c(O)c(OC(=O)c3cc(O)c(O)c(O)c3)c2)[...@h](OC(=O)c2cc(O)c(O)c(OC(=O)c3cc(O)c(O)c(O)c3)c2)[...@h]1oc(=O)c1cc(O)c(O)c(OC(=O)c2cc(O)c(O)c(O)c2)c1)c1cc(O)c(O)c(OC(=O)c2cc(O)c(O)c(O)c2)c1')

[12]>>> AllChem.EmbedMolecule(m)
Out[12] -1      # <--- indicates failure

[13]>>> AllChem.EmbedMolecule(m,useRandomCoords=True)
Out[13] 0      # <---- that time it worked
####

I will leave it to you to set this up in your code.

Note 1: since the useRandomCoords method is more robust (and faster)
for larger molecules, it might be worth thinking about testing the
size of the molecule (via mol.GetNumAtoms()) and automatically using
useRandomCoords for larger systems.

Note 2: Now that I look at your code with my "chemistry" eyes on
(instead of my "python help" eyes), I notice that you are generating
the conformations without Hs on the molecules. This sometimes will
give you geometries that aren't particularly happy. You might want to
consider using Chem.AddHs before you generate a conformation and then
calling Chem.RemoveHs before you write the molecule back out (assuming
you don't want the Hs in the output SD file). Note that both
Chem.AddHs and Chem.RemoveHs return new molecules.

Best Regards,
-greg

Re: [Rdkit-discuss] Output from UFFOptimizeMolecules(m)??

Reply via email to