Greg,
I found the files of interest and ran a few tests. The files resulting from
the tests are in the attached archive and here are the details.
The structures in question came from the non-aggregators set of Shoichet
which were available on his web page. My original intent was to convert the
SMILES files from the Shoichet set to SDF. This went smoothly enough until
I had to process the SDF for a different purpose. Four structures were
found to cause problems.
In the attached archive, each offending structure has 5 associated files
named according the the NGC ID associated with the original SMILES:
.smi - The original SMILES.
.sdf - The result I had found in my SMILES to SDF conversion having nan as
the atom coordinates.
.mol - Generated manually today by:
m = Chem.MolFromSmiles('offending SMILES')
AllChem.Compute2DCoords(m)
print file ('blah.mol','w+'), Chem.MolToMolBlock(m)
_fix.smi - This is the RDKit generated SMILES for the structure.
_fix.mol - The result of the following after the code snip above:
m=Chem.MolFromSmiles(Chem.MolToSmiles(m))
AllChem.Compute2DCoords(m)
print file ('blah_fix.mol','w+'), Chem.MolToMolBlock(m)
Only 14662 did not result in a fixed mol file. Interestingly, the first bad
conversion only has nan for coordinates of the platinum hexachloride. After
the SMILES round-trip, all coordinates are nan.
Please let me know if you need any further details.
-Kirk
On Sat, May 1, 2010 at 10:24 PM, Greg Landrum greg.land...@gmail.comwrote:
On Fri, Apr 30, 2010 at 12:56 PM, Greg Landrum greg.land...@gmail.com
wrote:
I don't see any problems in your script, so I have to assume that it's
a problem with the binary you're using. I'm travelling and don't have
a windows machine handy, so this will have to wait until I'm back home
this weekend.
Ok, I was able to reproduce this on my windows box. It's clearly a
problem with the windows build:
In [29]: m = Chem.MolFromSmiles('OC(=O)C11')
In [30]: AllChem.Compute2DCoords(m)
Out[30]: 0
In [31]: print Chem.MolToMolBlock(m)
--- print(Chem.MolToMolBlock(m))
RDKit 2D
8 8 0 0 0 0 0 0 0 0999 V2000
-1.#IND1.#QNB0. O 0 0 0 0 0 0 0 0 0 0 0 0
-1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0
-1.#IND1.#QNB0. O 0 0 0 0 0 0 0 0 0 0 0 0
-1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0
-1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0
-1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0
-1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0
-1.#IND1.#QNB0. C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 2 3
2 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 1 0
8 4 1 0
M END
I will look into this and see where the problem lies.
Note: whatever is going on here doesn't affect every depiction; other
molecules do end up with correct coordinates.
Best Regards,
-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
nan.tgz
Description: GNU Zip compressed data
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss