Re: [Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file

2017-02-01 Thread Susan Leung
Thank you very much Andrew!

Indeed, I did not spot the pattern - how silly of me!

From: Andrew Dalke [da...@dalkescientific.com]
Sent: 01 February 2017 16:49
To: Susan Leung
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file

Dear Susan,

  If I understand what's going on correctly, you have run across the difference 
between 0-based and 1-based indexing. See 
https://en.wikipedia.org/wiki/Zero-based_numbering .

RDKit, like most programming libraries and languages, index based on an offset 
from the beginning, so 0 means the beginning, 1 means one after the beginning, 
etc.

This is somewhat like how some buildings use "1" as the first floor above the 
ground, while others regard "1" as the ground floor, which is confusing if you 
are not used to it. (My apartment number says its on the second floor, while 
the elevator button says I live on floor 3.)

On Feb 1, 2017, at 5:15 PM, Susan Leung <susan.le...@st-hildas.ox.ac.uk> wrote:
> I am producing rdkit conformers and writing them to pdb files but am finding 
> the atom indexing in rdkit is different from the written pdb.
  ...
> Here is my code and output (the C=O looks like it's atoms 3,4 in rdkit but 
> 4,5 in the pdb file):
  ...
> In [3]: mol = Chem.MolFromSmiles("CC1=C(C(=O)C)C=CC=C1")
  ...
> In [4]: mol.GetSubstructMatch(Chem.MolFromSmiles('C(=O)'))
> Out[4]: (3, 4)
  ...
>   record_name  atom_number blank_1 atom_name alt_loc residue_name blank_2  \
> 0  HETATM1C1  UNL
> 1  HETATM2C2  UNL
> 2  HETATM3C3  UNL
> 3  HETATM4C4  UNL
> 4  HETATM5O1  UNL
> 5  HETATM6C5  UNL
> 6  HETATM7C6  UNL
> 7  HETATM8C7  UNL
> 8  HETATM9C8  UNL
> 9  HETATM   10C9  UNL


If I understand you correctly, then the "(3, 4)" as RDKit atom indices is (3+1, 
4+1) = (4,5) as PDB atom number, that is, the RDKit indices correspond to the 
left-most column of your table, rather than the atom_number column.

Cheers,

Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file

2017-02-01 Thread Andrew Dalke
Dear Susan,

  If I understand what's going on correctly, you have run across the difference 
between 0-based and 1-based indexing. See 
https://en.wikipedia.org/wiki/Zero-based_numbering .

RDKit, like most programming libraries and languages, index based on an offset 
from the beginning, so 0 means the beginning, 1 means one after the beginning, 
etc.

This is somewhat like how some buildings use "1" as the first floor above the 
ground, while others regard "1" as the ground floor, which is confusing if you 
are not used to it. (My apartment number says its on the second floor, while 
the elevator button says I live on floor 3.)

On Feb 1, 2017, at 5:15 PM, Susan Leung  wrote:
> I am producing rdkit conformers and writing them to pdb files but am finding 
> the atom indexing in rdkit is different from the written pdb.
  ...
> Here is my code and output (the C=O looks like it's atoms 3,4 in rdkit but 
> 4,5 in the pdb file):
  ...
> In [3]: mol = Chem.MolFromSmiles("CC1=C(C(=O)C)C=CC=C1")
  ...
> In [4]: mol.GetSubstructMatch(Chem.MolFromSmiles('C(=O)'))
> Out[4]: (3, 4)
  ...
>   record_name  atom_number blank_1 atom_name alt_loc residue_name blank_2  \
> 0  HETATM1C1  UNL   
> 1  HETATM2C2  UNL   
> 2  HETATM3C3  UNL   
> 3  HETATM4C4  UNL   
> 4  HETATM5O1  UNL   
> 5  HETATM6C5  UNL   
> 6  HETATM7C6  UNL   
> 7  HETATM8C7  UNL   
> 8  HETATM9C8  UNL   
> 9  HETATM   10C9  UNL  


If I understand you correctly, then the "(3, 4)" as RDKit atom indices is (3+1, 
4+1) = (4,5) as PDB atom number, that is, the RDKit indices correspond to the 
left-most column of your table, rather than the atom_number column.

Cheers,

Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file

2017-02-01 Thread Susan Leung
Dear all,

I am producing rdkit conformers and writing them to pdb files but am finding 
the atom indexing in rdkit is different from the written pdb. I would like this 
because I want to do a substructure search (using rdkit) to give me a handle on 
these atoms in the pdbfile.

Apologies if this has been discussed before.

Here is my code and output (the C=O looks like it's atoms 3,4 in rdkit but 4,5 
in the pdb file):

Thanks,

Susan

*

In [1]: import rdkit

In [2]: from rdkit import Chem
   ...: from rdkit.Chem import AllChem
   ...: from rdkit.Chem.Draw import IPythonConsole
   ...:

In [3]: mol = Chem.MolFromSmiles("CC1=C(C(=O)C)C=CC=C1")
   ...: idx = AllChem.EmbedMultipleConfs(mol,numConfs=1,randomSeed=0xf00d,
   ...:  
useExpTorsionAnglePrefs=True,useBasicKnowledge=True)
   ...:

In [4]: mol.GetSubstructMatch(Chem.MolFromSmiles('C(=O)'))
Out[4]: (3, 4)

In [5]: Chem.MolToPDBFile(mol,'./test.pdb')

In [6]: import biopandas
   ...: from biopandas.pdb import PandasPDB
   ...: ppdb = PandasPDB()
   ...: ppdb.read_pdb('./test.pdb')
   ...: ppdb.df['HETATM']
   ...:
Out[6]:
  record_name  atom_number blank_1 atom_name alt_loc residue_name blank_2  \
0  HETATM1C1  UNL
1  HETATM2C2  UNL
2  HETATM3C3  UNL
3  HETATM4C4  UNL
4  HETATM5O1  UNL
5  HETATM6C5  UNL
6  HETATM7C6  UNL
7  HETATM8C7  UNL
8  HETATM9C8  UNL
9  HETATM   10C9  UNL

  chain_id  residue_number insertion...x_coord  y_coord  z_coord  \
01  ...  0.1761.9111.137
11  ... -0.5130.7590.511
21  ...  0.272   -0.184   -0.139
31  ...  1.717   -0.056   -0.210
41  ...  2.406   -0.917   -0.801
51  ...  2.3441.1180.435
61  ... -0.332   -1.286   -0.743
71  ... -1.696   -1.416   -0.682
81  ... -2.495   -0.504   -0.048
91  ... -1.8790.5750.540

   occupancy  b_factor  blank_4 segment_id element_symbol charge  line_idx
01.0   0.0  CNaN 0
11.0   0.0  CNaN 1
21.0   0.0  CNaN 2
31.0   0.0  CNaN 3
41.0   0.0  ONaN 4
51.0   0.0  CNaN 5
61.0   0.0  CNaN 6
71.0   0.0  CNaN 7
81.0   0.0  CNaN 8
91.0   0.0  CNaN 9

[10 rows x 21 columns]

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss