OK, error messages where hidden by IPython for me too.

I used "Knime" to look at the sdf file, and it seems that the errors are "real" - polymers, organometallic compounds or completely daft, two examples:



The structure in the first column is the input sdf.

Simple workflow was:


"RDKit From Molecule" generated exactly the same 36 "broken" molecules as the Python script.

There's also one bad sdf record in the file.

Cheers,
Steve.


On 11/01/2017 20:17, Curt Fischer wrote:
I also got this to run with no problem in a Jupyter notebook.

BUT...I did see the error messages Milinda mentioned in the terminal that was running the jupyter notebook server. If I do *from rdkit.Chem.Draw import IPythonConsole *before running the code, I see all the errors/warnings in Jupyter.

I think this version of the loop is a bit more informative (best to do with IPythonConsole disabled):

    *from rdkit import Chem
    **from rdkit.Chem import Descriptors**
    **input_file = 'structures.sdf'**
    **suppl  = Chem.SDMolSupplier(input_file)**
    **low_mass=50
    **high_mass=1000**
    **ms = []**
    **for idx, mol in enumerate(suppl) :
    **  if mol is None:
    **      print "No molecule: " + str(idx)
    **      continue
    **  try:
    **      if (mol and
    **          round(Descriptors.ExactMolWt(mol), 4) >= low_mass and
    **          round(Descriptors.ExactMolWt(mol), 4) <= high_mass
    **         ):
    **
    **          ms.append(mol)
    **  except:
    **      print "Error: " + str(idx)
    **      pass*



It shows that all the problems are from rdkit failing to generate molecules, i.e. the try/except isn't doing anything. (Note it is bad practice to have a naked *except*).

The first molecule that fails is #491, heparin sulfate. The molecule can be imported using *Chem.MolFromInchi()*. This gels nicely with the rdkit error message for this molecule:

    RDKit ERROR: [12:12:56] Unhandled CTAB feature: S group SRU on
    line: 75. Molecule skipped.



The problem is thus the line M STY 1 1 SRU in the mol block, which you can see if you do

    *suppl.reset() for idx, mol in enumerate(suppl): if idx == 491:
    print suppl.GetItemText(idx)*


I don't know enough to pinpoint the precise reason for the error. And there are lots more errors to go through to get everything from HMDB into RDKit, it seeems.

Curt

On Wed, Jan 11, 2017 at 11:39 AM, Steve O'Hagan <soha...@manchester.ac.uk <mailto:soha...@manchester.ac.uk>> wrote:

    With same code and fresh file download, works fine for me without
    error.

    ms contains 35177 molecules. Perhaps your download was corrupt?


    On 11/01/2017 18:26, Milinda Samaraweera wrote:
    Dear Experts,

    I was trying to read in the attached SD file (downloaded from
    HMDB) and trying to calculate the exact mass of each entry:
    ​
    structures.sdf
    
<https://drive.google.com/file/d/0B3AmIbK_SzZhdGY3NVgyMDJiQjA/view?usp=drive_web>
    ​
    from rdkit import Chem
    from rdkit.Chem import Descriptors

    suppl  = Chem.SDMolSupplier(input_file)

    low_mass=50
    high_mass=1000

    ms = []

    for mol in suppl :

                if mol is None: continue

                try:
                    if mol and
round(Descriptors.ExactMolWt(mol),4)>=low_mass and round(Descriptors.ExactMolWt(mol),4)<=high_mass:
                        ms.append(mol)

                except:
                          pass

    By running the script, I got a barrage of errors as:

    [13:15:14] ERROR: Could not sanitize molecule ending on line 1993855
    [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater
    than permitted
    [13:15:14] Explicit valence for atom # 9 O, 3, is greater than
    permitted
    [13:15:14] ERROR: Could not sanitize molecule ending on line 1994014
    [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater
    than permitted
    [13:15:14] Explicit valence for atom # 9 O, 3, is greater than
    permitted
    [13:15:14] ERROR: Could not sanitize molecule ending on line 1996036
    [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater
    than permitted
    [13:15:16] Explicit valence for atom # 46 N, 4, is greater than
    permitted
    [13:15:16] ERROR: Could not sanitize molecule ending on line 2302532
    [13:15:16] ERROR: Explicit valence for atom # 46 N, 4, is greater
    than permitte
    [13:15:16] Explicit valence for atom # 16 N, 4, is greater than
    permitted
    [13:15:16] ERROR: Could not sanitize molecule ending on line 2302918
    [13:15:16] ERROR: Explicit valence for atom # 16 N, 4, is greater
    than permitte
    [13:15:17] Explicit valence for atom # 11 N, 4, is greater than
    permitted
    [13:15:17] ERROR: Could not sanitize molecule ending on line 2556541
    [13:15:17] ERROR: Explicit valence for atom # 11 N, 4, is greater
    than permitte
    [13:15:18]  S group SUP ignored on line 2836416
    [13:15:18] Explicit valence for atom # 1 Cl, 4, is greater than
    permitted
    [13:15:18] ERROR: Could not sanitize molecule ending on line 2841449
    [13:15:18] ERROR: Explicit valence for atom # 1 Cl, 4, is greater
    than permitte
    [13:15:19] Warning: conflicting stereochemistry at atom 10 ignored.
    [13:15:19] Warning: conflicting stereochemistry at atom 10 ignored.
    [13:15:19] Warning: conflicting stereochemistry at atom 17 ignored.
    [13:15:19] Warning: conflicting stereochemistry at atom 17 ignored.
    [13:15:19] Explicit valence for atom # 3 B, 4, is greater than
    permitted
    [13:15:19] ERROR: Could not sanitize molecule ending on line
    3107498 <tel:310-7498>
    [13:15:19] ERROR: Explicit valence for atom # 3 B, 4, is greater
    than permitted
    [13:15:19] Warning: conflicting stereochemistry at atom 6 ignored.
    [13:15:19] Warning: conflicting stereochemistry at atom 6 ignored.
    [13:15:20]  Unhandled CTAB feature: S group SRU on line: 3205922.
    Molecule skip
    [13:15:20] Explicit valence for atom # 0 Mg, 4, is greater than
    permitted
    [13:15:20] ERROR: Could not sanitize molecule ending on line 3222378
    [13:15:20] ERROR: Explicit valence for atom # 0 Mg, 4, is greater
    than permitte
    [13:15:20] Explicit valence for atom # 2 N, 4, is greater than
    permitted
    [13:15:20] ERROR: Could not sanitize molecule ending on line 3265386
    [13:15:20] ERROR: Explicit valence for atom # 2 N, 4, is greater
    than permitted
    [13:15:20] Explicit valence for atom # 31 N, 4, is greater than
    permitted
    [13:15:20] ERROR: Could not sanitize molecule ending on line 3305754
    [13:15:20] ERROR: Explicit valence for atom # 31 N, 4, is greater
    than permitte
    [13:15:21] Explicit valence for atom # 45 N, 4, is greater than
    permitted
    [13:15:21] ERROR: Could not sanitize molecule ending on line 3437055
    [13:15:21] ERROR: Explicit valence for atom # 45 N, 4, is greater
    than permitte
    [13:15:56] Explicit valence for atom # 3 C, 5, is greater than
    permitted
    [13:15:56] ERROR: Could not sanitize molecule ending on line 8391489
    [13:15:56] ERROR: Explicit valence for atom # 3 C, 5, is greater
    than permitted

    What causes these errors? there a way to suppress or solve the
    errors? or way to stop priting them up in the command prompt.

-- Thanks,
    Milinda Samaraweera,



    
------------------------------------------------------------------------------
    Developer Access Program for Intel Xeon Phi Processors
    Access to Intel Xeon Phi processor-based developer platforms.
    With one year of Intel Parallel Studio XE.
    Training and support from Colfax.
    Order your platform today.http://sdm.link/xeonphi

    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net
    <mailto:Rdkit-discuss@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>

    
------------------------------------------------------------------------------
    Developer Access Program for Intel Xeon Phi Processors Access to
    Intel Xeon Phi processor-based developer platforms. With one year
    of Intel Parallel Studio XE. Training and support from Colfax.
    Order your platform today. http://sdm.link/xeonphi
    _______________________________________________ Rdkit-discuss
    mailing list Rdkit-discuss@lists.sourceforge.net
    <mailto:Rdkit-discuss@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to