I'm working on a project where I cut a molecule along certain single bonds, to
find a core structure and one or more R-groups.
In yesterday's email, I mentioned a problem I have in creating a canonical
SMILES for the core when the R-groups are replaced by a hydrogen.
I also want to create a SMILES for the core where the R-groups are replaced
with a "-[*]". For example, given "c1ccccc1CCO" where the R-group is the "CCO",
I want to create the core "c1ccccc1[*]".
My code works except when the attachment point of the cut contains chiral
information. About 50% of the time, the chirality is inverted. The problem is
that chirality is a very fragile property which depends (I think) on the index
permutation order of the neighboring atoms to a given atom. It's tricky to
maintain that information during structure editing.
For the rest of this email I will present a reproducible, explain my
understanding, and sketch what seems a reasonable workaround. Let me know if I
am (in)correct in that. I'll then propose what I think may be a useful editing
function: ReplaceBond()
The following code snippet works like this. It starts with the input structure:
C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@]1(CO)O)C)O)F)C
I cut the bond from the "[C@@]4" to the final "C", then add a "[*] to each side
of the cut. This produces the canonical core:
[*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F
To verify if this is correct, I'll make some syntax changes directly on the
original SMILES string. I replace the final "C" character with "[*]". This
should be isomorphic to the expected core. I parse and canonicalize it to get
the expected core SMILES, which is:
[*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F
Note that the first has a '@@' in the second atom while the second has a '@'.
========== Reproducible which shows that the chirality changes ====
from __future__ import print_function
from rdkit import Chem
# I want to cut the SMILES at the given atoms, to make a core and an R-group.
# I want the location of the attachment points to be marked [*] atoms.
# 1 1 1 1 1 1 1 1 11 2
2 2 2 2
# 0 1 2 3 4 567 890 1 2 3 4 5 6 7 89 0
1 2 3 4
smiles =
"C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@]1(CO)O)C)O)F)C"
# will cut the bond between 12 and 24
^^^----------------------------------------^
core_atom, rgroup_atom = 12, 24
input_mol = Chem.MolFromSmiles(smiles)
assert rgroup_atom == input_mol.GetNumAtoms()-1, "this needs to be the terminal
'C'"
# Cut the bond and add two "[*]" atoms
emol = Chem.EditableMol(input_mol)
emol.RemoveBond(core_atom, rgroup_atom)
for atom in (core_atom, rgroup_atom):
new_atom = emol.AddAtom(Chem.Atom(0))
new_bond = emol.AddBond(atom, new_atom, Chem.BondType.SINGLE)
print("added bond", new_bond, "from", atom, "to", new_atom)
cut_mol = emol.GetMol()
cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True)
core_smiles = max(cut_smiles.split("."), key=len)
print("found core SMILES:\n", core_smiles)
# This prints:
#
[*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F
# Compare it to what I expected.
# In this case I can simply replace the terminal methyl in the SMILES.
noncanonical_expected_core_smiles = smiles[:-1] + '[*]'
expected_mol = Chem.MolFromSmiles(noncanonical_expected_core_smiles)
expected_core_smiles = Chem.MolToSmiles(expected_mol, isomericSmiles=True)
print("expected core SMILES:\n", expected_core_smiles)
# This prints:
#
[*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F
if core_smiles != expected_core_smiles:
print("!!!! Not the same !!!")
else:
print("Identical")
raise AssertionError("did not expect that")
======
Here is what I think is happening.
The problem occurs because the atom's chiral tag is connected to the
permutation order of its neighbors. I changed the parity of the permutation
order without updating the chiral tag.
I'll use the following bit of code to compute the parity of the neighbors for
the input structure and compare it to the parity of the neighbors of the 'cut'
structure.
====== Compute the parity for the configuration around the attachment atom on
the core
# This code snippet continues from the previous code
def parity(x):
# Use a Shell sort to count the number of swaps.
# (This takes O(N*N) time, which is is okay for 4 neighbors.)
N = len(x)
num_swaps = 0
for i in range(N-1):
for j in range(i+1, N):
if x[i] > x[j]:
x[i], x[j] = x[j], x[i]
num_swaps += 1
return (-1) ** num_swaps # either +1 or -1 symmetry
def get_bond_configuration(mol, atom):
# Return the parity of the atom indices of the neighbors
atom_obj = mol.GetAtomWithIdx(atom)
neighbors = [neighbor_atom_obj.GetIdx() for neighbor_atom_obj in
atom_obj.GetNeighbors()]
assert len(neighbors) == 4, (atom, neighbors)
return parity(neighbors)
input_configuration = get_bond_configuration(input_mol, 12)
print("input configuration:", input_configuration)
cut_configuration = get_bond_configuration(cut_mol, 12)
print("cut configuration:", cut_configuration)
assert input_configuration != cut_configuration, (
"I expected the configurations to be different")
======
This prints the output:
input configuration: -1
cut configuration: 1
which demonstrates that the parity has reversed.
I'll swap the chirality of the atom on the 'cut' molecule, generate the new
core SMILES, and show that it matches the expected SMILES:
======== correct chirality and compare the SMILES with the expected value
# This code snippet continues from the previous code
if cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CW:
new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CCW
elif cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CCW:
new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CW
else:
raise AssertionError("unknown chiral tag", cut_chiral_tag)
cut_atom_obj.SetChiralTag(new_chiral_tag)
updated_cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True)
updated_core_smiles = max(updated_cut_smiles.split("."), key=len)
print("updated core SMILES:\n", updated_core_smiles)
if updated_core_smiles == expected_core_smiles:
print("They match!")
else:
raise AssertionError("They do not match")
========
The result is the following:
updated core SMILES:
[*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F
They match!
which tells me that the fix appears to have worked.
==== Proposal ===
I propose a new API method which looks like :
edit_mol.ReplaceBond(from_atom, old_atom, new_atom)
This will:
1) remember the bond type T connecting from_atom to old_atom
2) remove the bond from from_atom to old_atom
3) create a new bond of type T between from_atom and new_atom
4) update the chiral information so it's equivalent to having
new_atom in the same configuration position as old_atom.
This assumes that my understanding of the problem is correct, that there is no
existing/better way to do in RDKit, that it's useful enough that others may
want it, and that someone else has the time to implement it.
Cheers,
Andrew
[email protected]
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss