On 25/10/13 08:09, James Davidson wrote:
> Hi Roger,
>
> Thanks for the response
>
>> The use of an integer file format "flavor" argument allows the caller to
>> customize the behavior of the readers and writers.  The semantics is that a
>> reasonable default is zero (for all bits), but that new features may be added
>> without changing the API/ABI.
>> Most of the bits above (for the writer) control strict compliance with the 
>> PDB
>> format specification.  For example, a flavor of 12 will write bond orders the
>> way the RCSB expects them both throwing away bond orders and increasing
>> the size of the PDB file.
> As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB 
> using the following
>
> import requests
> url = 
> "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VCI";
> response = requests.get(url)
> pdb_block = response.content
> response.close()
>
>
> pdb_block shows CONECT records only for the HETATM records.
> If I now read into RDKit, using the defaults, and write back out using the 
> defaults, I see CONECT records for every atom (ie protein as well).  And I 
> can't see any double-bonds rendered in PyMOL:
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
> pdb = Chem.MolFromPDBBlock(pdb_block)
> pdb_block_out = Chem.MolToPDBBlock(pdb)
>
> First 10 CONECT records of output:
> CONECT    1    2
> CONECT    2    3    5
> CONECT    3    4    4   10
> CONECT    5    6
> CONECT    6    7
> CONECT    7    8    8    9
> CONECT   10   11
> CONECT   11   12   14
> CONECT   12   13   13   17
> CONECT   14   15   16
>
>
> If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand 
> CONECT records in what looks like the original format (albeit now numbered 
> differently), and I still see CONECT records for the protein - but this PDB 
> *will* render double bonds in PyMOL.
>
> First 10 CONECT records of output:
> CONECT    3    4    4
> CONECT    7    8    8
> CONECT   12   13   13
> CONECT   19   20   20
> CONECT   23   24   24
> CONECT   28   29   29
> CONECT   35   36   36
> CONECT   38   39   39
> CONECT   40   42   42
> CONECT   41   43   43
>

If I may be so bold, I believe an important part of the puzzle is 
missing.  The residue-name/3-letter-code/comp-id in the PDB file is a 
pointer to an entry in the mmCIF-formatted chemical component dictionary 
that describes the compound, for all compounds for all entries released 
by the PDB.

http://deposit.pdb.org/cc_dict_tut.html

If this is an "internal" PDB file there will, very likely be a similar 
mmCIF file used for crystallographic refinement.

Only when these options fail would I consider turning to bond-order 
perception and CONECT records.

Paul.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to