I also got this to run with no problem in a Jupyter notebook.

BUT...I did see the error messages Milinda mentioned in the terminal that
was running the jupyter notebook server.  If I do *from rdkit.Chem.Draw
import IPythonConsole *before running the code, I see all the
errors/warnings in Jupyter.

I think this version of the loop is a bit more informative (best to do with
IPythonConsole disabled):


> *from rdkit import Chem**from rdkit.Chem import Descriptors*
> *input_file = 'structures.sdf'*
> *suppl  = Chem.SDMolSupplier(input_file)*
>
> *low_mass=50**high_mass=1000*
> *ms = []*
>
> *for idx, mol in enumerate(suppl) :*
> *            if mol is None: *
> *                print "No molecule: " + str(idx)*
> *                continue*
> *            try:*
> *                if (mol and *
> *                    round(Descriptors.ExactMolWt(mol), 4) >= low_mass
> and *
> *                    round(Descriptors.ExactMolWt(mol), 4) <= high_mass*
> *                   ):*
>
> *                    ms.append(mol)*
> *            except:*
> *                print "Error: " + str(idx)**                pass*



It shows that all the problems are from rdkit failing to generate
molecules, i.e. the try/except isn't doing anything.  (Note it is bad
practice to have a naked *except*).

The first molecule that fails is #491, heparin sulfate.  The molecule can
be imported using *Chem.MolFromInchi()*. This gels nicely with the rdkit
error message for this molecule:

RDKit ERROR: [12:12:56] Unhandled CTAB feature: S group SRU on line: 75.
Molecule skipped.



The problem is thus the line M STY 1 1 SRU in the mol block, which you can
see if you do

*suppl.reset() for idx, mol in enumerate(suppl): if idx == 491: print
> suppl.GetItemText(idx)*
>

I don't know enough to pinpoint the precise reason for the error.  And
there are lots more errors to go through to get everything from HMDB into
RDKit, it seeems.

Curt

On Wed, Jan 11, 2017 at 11:39 AM, Steve O'Hagan <soha...@manchester.ac.uk>
wrote:

> With same code and fresh file download, works fine for me without error.
>
> ms contains 35177 molecules. Perhaps your download was corrupt?
>
>
> On 11/01/2017 18:26, Milinda Samaraweera wrote:
>
> Dear Experts,
>
> I was trying to read in the attached SD file (downloaded from HMDB) and
> trying to calculate the exact mass of each entry:
> ​
>  structures.sdf
> <https://drive.google.com/file/d/0B3AmIbK_SzZhdGY3NVgyMDJiQjA/view?usp=drive_web>
> ​
> from rdkit import Chem
> from rdkit.Chem import Descriptors
>
> suppl  = Chem.SDMolSupplier(input_file)
>
> low_mass=50
> high_mass=1000
>
> ms = []
>
> for mol in suppl :
>
>             if mol is None: continue
>
>             try:
>                 if mol and round(Descriptors.ExactMolWt(mol),4)>=low_mass
> and    round(Descriptors.ExactMolWt(mol),4)<=high_mass:
>                     ms.append(mol)
>
>             except:
>                       pass
>
> By running the script, I got a barrage of errors as:
>
> [13:15:14] ERROR: Could not sanitize molecule ending on line 1993855
> [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater than
> permitted
> [13:15:14] Explicit valence for atom # 9 O, 3, is greater than permitted
> [13:15:14] ERROR: Could not sanitize molecule ending on line 1994014
> [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater than
> permitted
> [13:15:14] Explicit valence for atom # 9 O, 3, is greater than permitted
> [13:15:14] ERROR: Could not sanitize molecule ending on line 1996036
> [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater than
> permitted
> [13:15:16] Explicit valence for atom # 46 N, 4, is greater than permitted
> [13:15:16] ERROR: Could not sanitize molecule ending on line 2302532
> [13:15:16] ERROR: Explicit valence for atom # 46 N, 4, is greater than
> permitte
> [13:15:16] Explicit valence for atom # 16 N, 4, is greater than permitted
> [13:15:16] ERROR: Could not sanitize molecule ending on line 2302918
> [13:15:16] ERROR: Explicit valence for atom # 16 N, 4, is greater than
> permitte
> [13:15:17] Explicit valence for atom # 11 N, 4, is greater than permitted
> [13:15:17] ERROR: Could not sanitize molecule ending on line 2556541
> [13:15:17] ERROR: Explicit valence for atom # 11 N, 4, is greater than
> permitte
> [13:15:18]  S group SUP ignored on line 2836416
> [13:15:18] Explicit valence for atom # 1 Cl, 4, is greater than permitted
> [13:15:18] ERROR: Could not sanitize molecule ending on line 2841449
> [13:15:18] ERROR: Explicit valence for atom # 1 Cl, 4, is greater than
> permitte
> [13:15:19] Warning: conflicting stereochemistry at atom 10 ignored.
> [13:15:19] Warning: conflicting stereochemistry at atom 10 ignored.
> [13:15:19] Warning: conflicting stereochemistry at atom 17 ignored.
> [13:15:19] Warning: conflicting stereochemistry at atom 17 ignored.
> [13:15:19] Explicit valence for atom # 3 B, 4, is greater than permitted
> [13:15:19] ERROR: Could not sanitize molecule ending on line 3107498
> <310-7498>
> [13:15:19] ERROR: Explicit valence for atom # 3 B, 4, is greater than
> permitted
> [13:15:19] Warning: conflicting stereochemistry at atom 6 ignored.
> [13:15:19] Warning: conflicting stereochemistry at atom 6 ignored.
> [13:15:20]  Unhandled CTAB feature: S group SRU on line: 3205922. Molecule
> skip
> [13:15:20] Explicit valence for atom # 0 Mg, 4, is greater than permitted
> [13:15:20] ERROR: Could not sanitize molecule ending on line 3222378
> [13:15:20] ERROR: Explicit valence for atom # 0 Mg, 4, is greater than
> permitte
> [13:15:20] Explicit valence for atom # 2 N, 4, is greater than permitted
> [13:15:20] ERROR: Could not sanitize molecule ending on line 3265386
> [13:15:20] ERROR: Explicit valence for atom # 2 N, 4, is greater than
> permitted
> [13:15:20] Explicit valence for atom # 31 N, 4, is greater than permitted
> [13:15:20] ERROR: Could not sanitize molecule ending on line 3305754
> [13:15:20] ERROR: Explicit valence for atom # 31 N, 4, is greater than
> permitte
> [13:15:21] Explicit valence for atom # 45 N, 4, is greater than permitted
> [13:15:21] ERROR: Could not sanitize molecule ending on line 3437055
> [13:15:21] ERROR: Explicit valence for atom # 45 N, 4, is greater than
> permitte
> [13:15:56] Explicit valence for atom # 3 C, 5, is greater than permitted
> [13:15:56] ERROR: Could not sanitize molecule ending on line 8391489
> [13:15:56] ERROR: Explicit valence for atom # 3 C, 5, is greater than
> permitted
>
> What causes these errors? there a way to suppress or solve the errors? or
> way to stop priting them up in the command prompt.
>
> --
> Thanks,
> Milinda Samaraweera,
>
>
>
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
>
>
>
> _______________________________________________
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> ------------------------------------------------------------
> ------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to