Hi Rocco,

That is exactly what I was looking for. Thanks so much for your kind suggestion!

Massive Thanks,
Amy

From: Rocco Moretti <rmoretti...@gmail.com>
Date: Friday, October 27, 2023 at 12:30 PM
To: He, Amy <he.1...@buckeyemail.osu.edu>
Cc: rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Is there a Smiles library for common amino acids 
and ligands that can be used for AssignBondOrdersFromTemplate
I'll note that the official definitions for all the chemical entities in the 
PDB can be found in the wwPDB's Chemical Component Dictionary: https: //www. 
wwpdb. org/data/ccd That's in mmCIF format, but there are various SMILES and

I'll note that the official definitions for all the chemical entities in the 
PDB can be found in the wwPDB's Chemical Component Dictionary: 
https://www.wwpdb.org/data/ccd<https://urldefense.com/v3/__https:/www.wwpdb.org/data/ccd__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFN_BcxlM$>

That's in mmCIF format, but there are various SMILES and InChI definitions for 
the residues included in the file. (Your mileage may vary for the quality of 
those representations, though, especially for the rarer ones, but it should be 
no worse than the SDFs.)

You should be able to use an mmCIF parser to extract them.

e.g.
from mmcif.core.mmciflib import ParseCifSimple  # py-mmcif from the RCSB: `pip 
install mmcif`
ccd = ParseCifSimple("components.cif", True, 0, 255, "?", "logfile.txt") # 
logfile.txt is an arbitrary name

ALA = ccd.GetBlock("ALA")
desc = ALA.GetTable("pdbx_chem_comp_descriptor")
print( desc.GetColumnNames() )
for ii in range(desc.GetNumRows()):
    print( desc.GetRow(ii) )

['comp_id', 'type', 'program', 'program_version', 'descriptor']
['ALA', 'SMILES', 'ACDLabs', '10.04', 'O=C(O)C(N)C']
['ALA', 'SMILES_CANONICAL', 'CACTVS', '3.341', 'C[C@H](N)C(O)=O']
['ALA', 'SMILES', 'CACTVS', '3.341', 'C[CH](N)C(O)=O']
['ALA', 'SMILES_CANONICAL', 'OpenEye OEToolkits', '1.5.0', 'C[C@@H](C(=O)O)N']
['ALA', 'SMILES', 'OpenEye OEToolkits', '1.5.0', 'CC(C(=O)O)N']
['ALA', 'InChI', 'InChI', '1.03', 
'InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1']
['ALA', 'InChIKey', 'InChI', '1.03', 'QNAYBMKLOCPYGJ-REOHCLBHSA-N']

The components file is rather large, so parsing time might be a little long at 
times.

On Fri, Oct 27, 2023 at 10:55 AM He, Amy 
<he.1...@buckeyemail.osu.edu<mailto:he.1...@buckeyemail.osu.edu>> wrote:
Dear RDKit experts,

I need your advice on finding a source Smiles library for reference, to build 
the template molecule from Smiles for 
AssignBondOrdersFromTemplate<https://urldefense.com/v3/__https:/www.rdkit.org/docs/source/rdkit.Chem.AllChem.html__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFI6QyOaN$>.

I am using AssignBondOrdersFromTemplate to perceive bonds in a residue-wise 
manner from an input PDB, using a reference Smiles library like this:

ref_smi = {

    "ALA": "NC(C)C(=O)",
    "GLY": "NCC(=O)",
    "ILE": "NC(C(C)CC)C(=O)",

}

I wonder if there has been an open reference library for common amino acids and 
ligands that present in PDB files. A previous post on rdkit-discuss 
(https://rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception<https://urldefense.com/v3/__https:/rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFC9xFZti$>)
 points me to this website:
ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz<https://urldefense.com/v3/__ftp:/ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFPDTdrMJ$>
and useful links from
http://www.ebi.ac.uk/pdbe-srv/pdbechem/<https://urldefense.com/v3/__http:/www.ebi.ac.uk/pdbe-srv/pdbechem/__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFCmcNNh2$>

But I am no longer able to access the contents.

I guess we could always generate Smiles from the standardized SDF files.. Still 
I am wondering if there is an existing Smiles library (like a reference 
datafile), where we can retrieve the Smiles string using the residue names of 
common amino acids and maybe also ligands.

Any comments or suggestions would be greatly appreciated. Thank you for your 
time and kind support in advance!


Bests,


--
Amy He
Chemistry Graduate Teaching Assistant
Hadad Lab
Ohio State University
he.1...@osu.edu<mailto:he.1...@osu.edu>


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.com/v3/__https:/lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFOA_LWfL$>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to