Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds
Hello Peter, Great, that just made me realize that I was not using my most recent conda environment version of RDkit. I reread the 2D sdf file with the latest rdkit version and now only 31 molecules are tossed out by the SDMolsupplier in RDKit. 51 compounds had errors when reading in the smiles strings. Brian From: Peter S. Shenkin [mailto:shen...@gmail.com] Sent: Monday, August 07, 2017 14:26 To: Bennion, Brian Cc: Chris Swain ; rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds That molecule's SMILES is correctly rendered by RDKit, or at least by the version of RDKit behind Slack: [Inline image 1] -P. On Mon, Aug 7, 2017 at 3:54 PM, Bennion, Brian mailto:benni...@llnl.gov>> wrote: The carbocations are in small heterocyclic molecules. see CHEMBL3815233 Brian From: Chris Swain mailto:sw...@mac.com>> Sent: Monday, August 7, 2017 11:46:30 AM To: rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds I've not tried to read in ChEMBL but I have tried to process other large datasets e.g. ZINC. My impression was that problems arose with small heterocyclic systems, particularly if fused or containing multiple different heteroatoms. I did wonder if the different aromaticity models might be the issue. Chris -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds
That molecule's SMILES is correctly rendered by RDKit, or at least by the version of RDKit behind Slack: [image: Inline image 1] -P. On Mon, Aug 7, 2017 at 3:54 PM, Bennion, Brian wrote: > The carbocations are in small heterocyclic molecules. see CHEMBL3815233 > > Brian > > > -- > *From:* Chris Swain > *Sent:* Monday, August 7, 2017 11:46:30 AM > *To:* rdkit-discuss@lists.sourceforge.net > *Subject:* [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 > million compounds > > I've not tried to read in ChEMBL but I have tried to process other large > datasets e.g. ZINC. My impression was that problems arose with small > heterocyclic systems, particularly if fused or containing multiple > different heteroatoms. I did wonder if the different aromaticity models > might be the issue. > > Chris > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds
The carbocations are in small heterocyclic molecules. see CHEMBL3815233 Brian From: Chris Swain Sent: Monday, August 7, 2017 11:46:30 AM To: rdkit-discuss@lists.sourceforge.net Subject: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds I've not tried to read in ChEMBL but I have tried to process other large datasets e.g. ZINC. My impression was that problems arose with small heterocyclic systems, particularly if fused or containing multiple different heteroatoms. I did wonder if the different aromaticity models might be the issue. Chris -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds
I've not tried to read in ChEMBL but I have tried to process other large datasets e.g. ZINC. My impression was that problems arose with small heterocyclic systems, particularly if fused or containing multiple different heteroatoms. I did wonder if the different aromaticity models might be the issue. Chris -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss