Hi Brian, Konrad,

Just a sidenote - It's not a crash. Python/Boost is just complaining, that
the first argument is in fact None and it should be RDKit Mol instance.
Instead of filtering all lowercase s from smiles, you should check if mol
is None in your for loop, and skip those which are.

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-08-07 20:39 GMT+02:00 Bennion, Brian <benni...@llnl.gov>:

>
>
>
>
> *From:* Bennion, Brian
> *Sent:* Monday, August 07, 2017 11:39
> *To:* 'Konrad Koehler' <konrad.koeh...@me.com>
> *Subject:* RE: [Rdkit-discuss] using rdkit to read in chembl23 1.7
> million compounds
>
>
>
> Hello Konrad,
>
> Thank you for your response.
>
> For the handful of compounds i looked at:
>
> multiple ringed compounds that had %11 up to %14 labeled rings coordinated
> to zinc had issues
>
> aromatic carbocations [c+] had issues
>
>
>
> As a side note, I attempted reading in the 2D sdf file that chembl
> supplies.  I was able to reduce the failed molecules to 253.
>
> There were still many warnings about stereochemistry being ambiguous and
> strange tags like STY at the end of the molecules.
>
>
>
> Brian
>
>
>
> *From:* Konrad Koehler [mailto:konrad.koeh...@me.com
> <konrad.koeh...@me.com>]
> *Sent:* Monday, August 07, 2017 11:25
> *To:* Bennion, Brian <benni...@llnl.gov>
> *Subject:* Re: [Rdkit-discuss] using rdkit to read in chembl23 1.7
> million compounds
>
>
>
> Hi Brain,
>
>
>
> Similar problems here in trying to read, fragment, and canonicalize the
> Zinc “In Stock” database of roughly one million compounds. Most of the
> problematic structures contained aromatic sulfur atoms.  (Thiophene itself
> is no problem.  Most of the crashes were from more complex heteroaromatic
> systems containing sulfur). Filtering the input file to remove SMILES
> strings with lowercase “s” allowed me to process the rest of the file
> without RDKit crashing.
>
>
>
> Cheers,
>
>
>
> Konrad
>
>
>
> crash dump:
>
>
>
> Can't kekulize mol.
>
>     child_node = AllChem.CanonSmiles(child_node)
>
>   File "/Users/konradkoehler/anaconda/lib/python2.7/site-
> packages/rdkit/Chem/__init__.py", line 43, in CanonSmiles
>
>     return MolToSmiles(m, useChiral)
>
> Boost.Python.ArgumentError: Python argument types in
>
>     rdkit.Chem.rdmolfiles.MolToSmiles(NoneType, int)
>
> did not match C++ signature:
>
>     MolToSmiles(RDKit::ROMol mol, bool isomericSmiles=False, bool
> kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
> allBondsExplicit=False, bool allHsExplicit=False)
>
>
>
>
>
> On 7 Aug 2017, at 18:36, Bennion, Brian <benni...@llnl.gov> wrote:
>
>
>
> Hello,
>
>
>
> This might be a nit picky question.  I am attempting to read in the smiles
> string for the 1.7 million non-biological compounds in the latest chembl23
> release.  As it turns out 382 compounds fail to be read by RDkit.
>
> The errors are either kekulization failure or valence errors.
>
>
>
> Has anyone attempted this task before?
>
> Brian
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://
> sdm.link/slashdot_______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to