>  PDB files have no bond information,

This is not true. The chemistry is specified in the Chemical Component Dictionary using the residue identifier (so it's a reference to a chemical description, it's not embedded).

https://www.wwpdb.org/data/ccd

https://github.com/pdbeurope/ccdutils

Paul.


On 27/09/2021 11:22, Lewis Martin wrote:
Very interesting - thank you Francois! PDB re-do does the trick:

*import requests
from rdkit import Chem

def getPDB(code):
    out = requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb <https://pdb-redo.eu/db/{code}/{code}_final.pdb>')
     return out.content

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)*

I think this solves it for me, but if anyone knows how to infer correct bonding information without relying on distances, I'd love to hear it too! So far I've noticed that Parmed and PDBFixer infer correct bonds, but they don't determine bond orders, so it's difficult to port the molecule into RDKit.

Cheers
Lewis



On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger <mli...@ligand.eu 
<mailto:mli...@ligand.eu>> wrote:

    Hi Lewis,

    Just an idea: you might try to load your PDB in UCSF Chimera, then
    save it as a mol2 or sdf file.
    Then, try to read this sdf file from rdkit.

    Another idea: try to get your pdb file through the pdbredo service.
    https://pdb-redo.eu/ <https://pdb-redo.eu/>
    They might have fixed a few things; maybe this PDB will read better in
    rdkit.

    Regards,
    F.

    On 26/09/2021 17:02, Lewis Martin wrote:
     > Hi RDKit,
     > While parsing proteins from the PBD with RDKit, I've come across
     > situations where the distance-based bond determination leads to
     > 'incorrect' bonds between atoms that are erroneously too close
     > together. PDB files have no bond information, so it's not really
     > 'incorrect' (rather the model coordinates are off), but the bonds are
     > nonphysical - and it means the Mol objects won't sanitize.
     >
     > Here's an example:
     >
     > import requests
     > from io import BytesIO
     > import gzip
     > from rdkit import Chem
     >
     > def getPDB(code):
     >     out =
     > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz
    <https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz> [1]')
     >     binary_stream =  BytesIO(out.content)
     >     return gzip.open(binary_stream).read()
     >
     > pdb_string = getPDB('3udn')
     > Chem.MolFromPDBBlock(pdb_string)
     >
     > Error is:
     >
     > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
     > greater than permitted
     >
     > This is caused by the threonine 72 sidechain being too close to the
     > TYR71 backbone carbonyl oxygen (this can be visualized at
     > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction&sele=09B
    <https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction&sele=09B> ,
     > TYR71 is near the ligand).
     >
     > Does anyone know how to avoid this to create a Chem.Mol? I've tried
     > using Parmed and PDBFixer, since they use residue templates to
     > generate the correct bonding topology, but they don't write CONECT
     > records or SDFs, so the bonds are still lost to RDKit.
     >
     > Thanks for your time!
     > Lewis
     > PS - why not just use PDBFixer? I'm trying to calculate atom
     > invariants using RDKit's morgan fingerprinter implementation, so
     > ultimately I want a sanitized Mol object
     >
     > Links:
     > ------
     > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
    <https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz>
     > _______________________________________________
     > Rdkit-discuss mailing list
     > Rdkit-discuss@lists.sourceforge.net 
<mailto:Rdkit-discuss@lists.sourceforge.net>
     > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to