Dear Marshall, On Sat, Mar 7, 2009 at 5:26 AM, Marshall Levesque <marsh...@emolecules.com> wrote: > > Your suggestion fixed my problems with the file I was working with. I am a > novice with both to RDKit and Python.
Glad to hear that the script fixed at least part of the problems. > > I have a new question for you: I am using the model script you provided, > but with a different set of compounds. I will reorder your remaining two questions since the first has an easier (and more fundamental) answer. > AND THEN, I tried to stop using the UFFOptimizeMolecule function and use > only EmbedMolecule and I received this error: > > ############ > embed: 613 > embed: 614 > embed: 615 > embed: 616 > embed: 617 > embed: 618 > [20:03:38] Explicit valence for atom # 0 N greater than permitted > [20:03:38] Unexpected error hit on line 38532 > [20:03:38] ERROR: moving to the begining of the next molecule > embed: 619 > Traceback (most recent call last): > File "2-3D.py", line 10, in <module> > AllChem.EmbedMolecule(mol) > Boost.Python.ArgumentError: Python argument types in > Chem.rdDistGeom.EmbedMolecule(NoneType) > did not match C++ signature: > EmbedMolecule(RDKit::ROMol {lvalue} mol, unsigned int maxAttempts=0, int > randomSeed=-1, bool clearConfs=True, bool useRandomCoords=False, double > boxSizeMult=2.0, bool randNegEig=True, unsigned int numZeroFail=1, > boost::python::dict {lvalue} coordMap={}) This type of error message, the Boost.Python.ArgumentError where you have: > Boost.Python.ArgumentError: Python argument types in > SOME_FUNCTION(NoneType) > did not match C++ signature: > SOME_FUNCTION(RDKit::ROMol {lvalue} mol, OTHER_ARGUMENTS) Is the RDKit/Python way of telling you that you have a molecule that the RDKit either failed to process or rejected. > I think this is the #618 and #619 compound(s) that caused the problem: > http://www.emolecules.com/cgi-bin/more?vid=5753883 > http://www.emolecules.com/cgi-bin/more?vid=15920212 The first molecule looks ok to me (and I can read it in from the SMILES given on your web page). The second has a neutral four-coordinate nitrogen, so it's going to be rejected by the RDKit, which is very picky about valence. > If there are valence problems, I can just record those errors and throw out > those molecules. But I do not understand why the FileParseException does > not succeed in proceeding to the next molecule. I am assuming it has to do > with the way the for-loop is constructed, but I wasn't able to figure it > out. The way to skip molecules the RDKit considers "bad" is to know that the molecule processing machinery returns the special value None when it encounters a molecule it's unhappy with. You can test for this in your for loop: #### for i,mol in enumerate(supplier): if mol is None: # record or report error, supplier.GetItemText(i) might be useful continue # do whatever else you want to do with the molecule mols.append(mol) #### I included two placeholders here for your code. The important part is the test to see if the molecule is None and the continue, which causes the rest of the loop body to be skipped for that molecule. > > ##### SCRIPT to embed and optimize ######### > import Chem > from Chem import AllChem > suppl = Chem.SDMolSupplier('usdrug.700.sdf') > > mols = [] > for i,mol in enumerate(suppl): > num = mol.GetNumAtoms() > print 'embed:',i > AllChem.EmbedMolecule(mol) > print 'optimize:',i > AllChem.UFFOptimizeMolecule(mol,500) > mols.append(mol) > > w = Chem.SDWriter('output.sdf') > for mol in mols: w.write(mol) > > ########################## > > ###### OUTPUT error ######## > embed: 529 > optimize: 529 > embed: 530 > optimize: 530 > embed: 531 > optimize: 531 > embed: 532 > optimize: 532 > Traceback (most recent call last): > File "2-3D.py", line 12, in <module> > AllChem.UFFOptimizeMolecule(mol,500) > ValueError: Bad Conformer Id > > ############################### > > I think I narrowed the molecule down to either of these two (with the first > looking more likely to be the problem): > http://www.emolecules.com/cgi-bin/more?vid=5852064 > http://www.emolecules.com/cgi-bin/more?vid=1985607 What is happening here is that the call to EmbedMolecule is failing to generate a conformation. This can happen with very large, flexible molecules like your two examples. The way you check this is to test the return value from EmbedMolecule(), which returns -1 if it fails to generate a conformation. In cases where the standard embedding algorithm fails (returns -1), you can try calling EmbedMolecule again with the useRandomCoords argument set to True: #### [11]>>> m = Chem.MolFromSmiles('O=C(o...@h]1o[c@H](COC(=O)c2cc(O)c(O)c(OC(=O)c3cc(O)c(O)c(O)c3)c2)[C@@H](OC(=O)c2cc(O)c(O)c(OC(=O)c3cc(O)c(O)c(O)c3)c2)[...@h](OC(=O)c2cc(O)c(O)c(OC(=O)c3cc(O)c(O)c(O)c3)c2)[...@h]1oc(=O)c1cc(O)c(O)c(OC(=O)c2cc(O)c(O)c(O)c2)c1)c1cc(O)c(O)c(OC(=O)c2cc(O)c(O)c(O)c2)c1') [12]>>> AllChem.EmbedMolecule(m) Out[12] -1 # <--- indicates failure [13]>>> AllChem.EmbedMolecule(m,useRandomCoords=True) Out[13] 0 # <---- that time it worked #### I will leave it to you to set this up in your code. Note 1: since the useRandomCoords method is more robust (and faster) for larger molecules, it might be worth thinking about testing the size of the molecule (via mol.GetNumAtoms()) and automatically using useRandomCoords for larger systems. Note 2: Now that I look at your code with my "chemistry" eyes on (instead of my "python help" eyes), I notice that you are generating the conformations without Hs on the molecules. This sometimes will give you geometries that aren't particularly happy. You might want to consider using Chem.AddHs before you generate a conformation and then calling Chem.RemoveHs before you write the molecule back out (assuming you don't want the Hs in the output SD file). Note that both Chem.AddHs and Chem.RemoveHs return new molecules. Best Regards, -greg