Dear Susan, the reason is that PDBAtomLine() ignores records where the alternate location is different from ' ', 'A' or '1':
https://github.com/rdkit/rdkit/blob/e7e17adc4ef822d2663fa6e1ba5b978512c7a8b4/Code/GraphMol/FileParsers/PDBParser.cpp#L62 I have run myself in the past into PDB files which have a single alternate location which is none of the above (e.g., only 'B'). To avoid the problem you may set flavor=1 in MolFromPDBBlock(): m_altlocB = Chem.MolFromPDBBlock(""" HETATM 7886 C1 BG5K C 702 -0.945 12.634 14.174 0.51 48.52 C HETATM 7887 C2 BG5K C 702 -0.880 12.457 12.854 0.51 49.59 C HETATM 7888 C3 BG5K C 702 -2.175 12.105 12.250 0.51 50.84 C HETATM 7889 O4 BG5K C 702 -3.162 12.016 12.973 0.51 56.22 O HETATM 7890 N5 BG5K C 702 -2.196 11.880 10.884 0.51 46.36 N HETATM 7891 C6 BG5K C 702 -3.381 11.554 10.123 0.51 42.61 C HETATM 7892 C7 BG5K C 702 -2.994 11.820 8.683 0.51 34.05 C HETATM 7893 C8 BG5K C 702 -1.767 12.664 8.829 0.51 37.06 C HETATM 7894 C9 BG5K C 702 -1.088 12.006 9.994 0.51 39.00 C HETATM 7895 N10BG5K C 702 -0.970 12.778 7.632 0.51 32.64 N HETATM 7896 C11BG5K C 702 -0.892 13.901 6.897 0.51 31.98 C HETATM 7897 N12BG5K C 702 0.291 14.135 6.316 0.51 32.89 N HETATM 7898 C13BG5K C 702 0.365 15.236 5.562 0.51 37.92 C HETATM 7899 C14BG5K C 702 1.667 15.577 4.898 0.51 45.61 C HETATM 7900 N15BG5K C 702 2.013 14.870 3.674 0.51 54.36 N HETATM 7901 C16BG5K C 702 2.736 13.630 3.932 0.51 61.64 C HETATM 7902 C17BG5K C 702 3.129 13.030 2.594 0.51 67.85 C HETATM 7903 O18BG5K C 702 1.993 12.804 1.757 0.51 70.80 O HETATM 7904 C19BG5K C 702 1.368 14.060 1.475 0.51 61.74 C HETATM 7905 C20BG5K C 702 0.884 14.647 2.784 0.51 59.26 C HETATM 7906 C21BG5K C 702 -0.704 16.091 5.410 0.51 30.62 C HETATM 7907 C22BG5K C 702 -1.867 15.765 6.060 0.51 27.18 C HETATM 7908 N23BG5K C 702 -3.002 16.585 5.957 0.51 24.96 N HETATM 7909 C24BG5K C 702 -4.148 16.590 6.655 0.51 25.09 C HETATM 7910 N25BG5K C 702 -5.074 17.504 6.513 0.51 23.42 N HETATM 7911 C26BG5K C 702 -6.122 17.226 7.351 0.51 23.35 C HETATM 7912 C27BG5K C 702 -7.284 17.952 7.501 0.51 24.35 C HETATM 7913 C28BG5K C 702 -8.201 17.482 8.421 0.51 25.73 C HETATM 7914 C29BG5K C 702 -7.973 16.341 9.153 0.51 25.63 C HETATM 7915 N30BG5K C 702 -6.855 15.637 9.017 0.51 29.00 N HETATM 7916 C31BG5K C 702 -5.966 16.100 8.129 0.51 25.96 C HETATM 7917 S32BG5K C 702 -4.462 15.337 7.792 0.51 27.85 S HETATM 7918 N33BG5K C 702 -2.005 14.665 6.815 0.51 30.08 N """, flavor=1) Chem.MolToSmiles(m_altlocB) 'CC(O)N1CC[C@H](NC2NC(CN3CCOCC3)CC(NC3NC4CCCNC4S3)N2)C1' Cheers, p. On Wed, Dec 9, 2020 at 6:07 PM Susan Leung <susanhle...@gmail.com> wrote: > Hi all! > > I'm having problems reading in a PDB file with altloc B.... See below code > that tries to read in residue 702, chain C in 4kio.pdb ( > https://files.rcsb.org/view/4KIO.pdb) . A mol object is returned but it > hasn’t got anything in it (or at least it returns an empty string when I do > Chem.MolFromSmiles() on it.) > > In [1]: import rdkit > > In [2]: from rdkit import Chem > > In [3]: rdkit.__version__ > > Out[3]: '2019.03.2' > > In [4]: m_altlocB = Chem.MolFromPDBBlock(""" > > ...: HETATM 7886 C1 BG5K C 702 -0.945 12.634 14.174 0.51 48.52 > C > > ...: HETATM 7887 C2 BG5K C 702 -0.880 12.457 12.854 0.51 49.59 > C > > ...: HETATM 7888 C3 BG5K C 702 -2.175 12.105 12.250 0.51 50.84 > C > > ...: HETATM 7889 O4 BG5K C 702 -3.162 12.016 12.973 0.51 56.22 > O > > ...: HETATM 7890 N5 BG5K C 702 -2.196 11.880 10.884 0.51 46.36 > N > > ...: HETATM 7891 C6 BG5K C 702 -3.381 11.554 10.123 0.51 42.61 > C > > ...: HETATM 7892 C7 BG5K C 702 -2.994 11.820 8.683 0.51 34.05 > C > > ...: HETATM 7893 C8 BG5K C 702 -1.767 12.664 8.829 0.51 37.06 > C > > ...: HETATM 7894 C9 BG5K C 702 -1.088 12.006 9.994 0.51 39.00 > C > > ...: HETATM 7895 N10BG5K C 702 -0.970 12.778 7.632 0.51 32.64 > N > > ...: HETATM 7896 C11BG5K C 702 -0.892 13.901 6.897 0.51 31.98 > C > > ...: HETATM 7897 N12BG5K C 702 0.291 14.135 6.316 0.51 32.89 > N > > ...: HETATM 7898 C13BG5K C 702 0.365 15.236 5.562 0.51 37.92 > C > > ...: HETATM 7899 C14BG5K C 702 1.667 15.577 4.898 0.51 45.61 > C > > ...: HETATM 7900 N15BG5K C 702 2.013 14.870 3.674 0.51 54.36 > N > > ...: HETATM 7901 C16BG5K C 702 2.736 13.630 3.932 0.51 61.64 > C > > ...: HETATM 7902 C17BG5K C 702 3.129 13.030 2.594 0.51 67.85 > C > > ...: HETATM 7903 O18BG5K C 702 1.993 12.804 1.757 0.51 70.80 > O > > ...: HETATM 7904 C19BG5K C 702 1.368 14.060 1.475 0.51 61.74 > C > > ...: HETATM 7905 C20BG5K C 702 0.884 14.647 2.784 0.51 59.26 > C > > ...: HETATM 7906 C21BG5K C 702 -0.704 16.091 5.410 0.51 30.62 > C > > ...: HETATM 7907 C22BG5K C 702 -1.867 15.765 6.060 0.51 27.18 > C > > ...: HETATM 7908 N23BG5K C 702 -3.002 16.585 5.957 0.51 24.96 > N > > ...: HETATM 7909 C24BG5K C 702 -4.148 16.590 6.655 0.51 25.09 > C > > ...: HETATM 7910 N25BG5K C 702 -5.074 17.504 6.513 0.51 23.42 > N > > ...: HETATM 7911 C26BG5K C 702 -6.122 17.226 7.351 0.51 23.35 > C > > ...: HETATM 7912 C27BG5K C 702 -7.284 17.952 7.501 0.51 24.35 > C > > ...: HETATM 7913 C28BG5K C 702 -8.201 17.482 8.421 0.51 25.73 > C > > ...: HETATM 7914 C29BG5K C 702 -7.973 16.341 9.153 0.51 25.63 > C > > ...: HETATM 7915 N30BG5K C 702 -6.855 15.637 9.017 0.51 29.00 > N > > ...: HETATM 7916 C31BG5K C 702 -5.966 16.100 8.129 0.51 25.96 > C > > ...: HETATM 7917 S32BG5K C 702 -4.462 15.337 7.792 0.51 27.85 > S > > ...: HETATM 7918 N33BG5K C 702 -2.005 14.665 6.815 0.51 30.08 > N > > ...: """) > > In [5]: Chem.MolToSmiles(m_altlocB) > > Out[5]: '' > > But if I change the altloc column which is column 17 from B to A then it > reads in and prints the SMILES fine. > > In [6]: m_altlocA = Chem.MolFromPDBBlock(""" > > ...: HETATM 7886 C1 AG5K C 702 -0.945 12.634 14.174 0.51 48.52 > C > > ...: HETATM 7887 C2 AG5K C 702 -0.880 12.457 12.854 0.51 49.59 > C > > ...: HETATM 7888 C3 AG5K C 702 -2.175 12.105 12.250 0.51 50.84 > C > > ...: HETATM 7889 O4 AG5K C 702 -3.162 12.016 12.973 0.51 56.22 > O > > ...: HETATM 7890 N5 AG5K C 702 -2.196 11.880 10.884 0.51 46.36 > N > > ...: HETATM 7891 C6 AG5K C 702 -3.381 11.554 10.123 0.51 42.61 > C > > ...: HETATM 7892 C7 AG5K C 702 -2.994 11.820 8.683 0.51 34.05 > C > > ...: HETATM 7893 C8 AG5K C 702 -1.767 12.664 8.829 0.51 37.06 > C > > ...: HETATM 7894 C9 AG5K C 702 -1.088 12.006 9.994 0.51 39.00 > C > > ...: HETATM 7895 N10AG5K C 702 -0.970 12.778 7.632 0.51 32.64 > N > > ...: HETATM 7896 C11AG5K C 702 -0.892 13.901 6.897 0.51 31.98 > C > > ...: HETATM 7897 N12AG5K C 702 0.291 14.135 6.316 0.51 32.89 > N > > ...: HETATM 7898 C13AG5K C 702 0.365 15.236 5.562 0.51 37.92 > C > > ...: HETATM 7899 C14AG5K C 702 1.667 15.577 4.898 0.51 45.61 > C > > ...: HETATM 7900 N15AG5K C 702 2.013 14.870 3.674 0.51 54.36 > N > > ...: HETATM 7901 C16AG5K C 702 2.736 13.630 3.932 0.51 61.64 > C > > ...: HETATM 7902 C17AG5K C 702 3.129 13.030 2.594 0.51 67.85 > C > > ...: HETATM 7903 O18AG5K C 702 1.993 12.804 1.757 0.51 70.80 > O > > ...: HETATM 7904 C19AG5K C 702 1.368 14.060 1.475 0.51 61.74 > C > > ...: HETATM 7905 C20AG5K C 702 0.884 14.647 2.784 0.51 59.26 > C > > ...: HETATM 7906 C21AG5K C 702 -0.704 16.091 5.410 0.51 30.62 > C > > ...: HETATM 7907 C22AG5K C 702 -1.867 15.765 6.060 0.51 27.18 > C > > ...: HETATM 7908 N23AG5K C 702 -3.002 16.585 5.957 0.51 24.96 > N > > ...: HETATM 7909 C24AG5K C 702 -4.148 16.590 6.655 0.51 25.09 > C > > ...: HETATM 7910 N25AG5K C 702 -5.074 17.504 6.513 0.51 23.42 > N > > ...: HETATM 7911 C26AG5K C 702 -6.122 17.226 7.351 0.51 23.35 > C > > ...: HETATM 7912 C27AG5K C 702 -7.284 17.952 7.501 0.51 24.35 > C > > ...: HETATM 7913 C28AG5K C 702 -8.201 17.482 8.421 0.51 25.73 > C > > ...: HETATM 7914 C29AG5K C 702 -7.973 16.341 9.153 0.51 25.63 > C > > ...: HETATM 7915 N30AG5K C 702 -6.855 15.637 9.017 0.51 29.00 > N > > ...: HETATM 7916 C31AG5K C 702 -5.966 16.100 8.129 0.51 25.96 > C > > ...: HETATM 7917 S32AG5K C 702 -4.462 15.337 7.792 0.51 27.85 > S > > ...: HETATM 7918 N33AG5K C 702 -2.005 14.665 6.815 0.51 30.08 > N > > ...: """) > > In [7]: Chem.MolToSmiles(m_altlocA) > > Out[7]: 'CC(O)N1CC[C@H](NC2NC(CN3CCOCC3)CC(NC3NC4CCCNC4S3)N2)C1' > > Am I doing something wrong? I also attach this code as a .ipynb. > > Many thanks in advance! > > Susan > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss