HI Andrew,

I don't have a solution for RDKit, because I don't know if you can do this
sort of thing.  But when I've tackled this in OEChem, I've changed the
atomic number of the substituent atom to something else (I always use Xe,
because I know that will never be anything other than a marker atom, but
you might change it to 0 for *) and then trim the atoms off that one (other
than the parent chiral atom).  That way, you never break the bond between
the core and the substituent and the chirality is preserved - it might
change from R to S, because that depends on the atomic numbers of the atoms
on the chiral atom (CPK rules) but the relative orders should remain the
same.

Hope that helps,

Dave


On Wed, Feb 3, 2016 at 4:48 AM, Andrew Dalke <da...@dalkescientific.com>
wrote:

> I'm working on a project where I cut a molecule along certain single
> bonds, to find a core structure and one or more R-groups.
>
> In yesterday's email, I mentioned a problem I have in creating a canonical
> SMILES for the core when the R-groups are replaced by a hydrogen.
>
> I also want to create a SMILES for the core where the R-groups are
> replaced with a "-[*]". For example, given "c1ccccc1CCO" where the R-group
> is the "CCO", I want to create the core "c1ccccc1[*]".
>
> My code works except when the attachment point of the cut contains chiral
> information. About 50% of the time, the chirality is inverted. The problem
> is that chirality is a very fragile property which depends (I think) on the
> index permutation order of the neighboring atoms to a given atom. It's
> tricky to maintain that information during structure editing.
>
> For the rest of this email I will present a reproducible, explain my
> understanding, and sketch what seems a reasonable workaround. Let me know
> if I am (in)correct in that. I'll then propose what I think may be a useful
> editing function: ReplaceBond()
>
> The following code snippet works like this. It starts with the input
> structure:
>
>    C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@
> ]1(CO)O)C)O)F)C
>
> I cut the bond from the "[C@@]4" to the final "C", then add a "[*] to
> each side of the cut. This produces the canonical core:
>
>    [*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
> To verify if this is correct, I'll make some syntax changes directly on
> the original SMILES string. I replace the final "C" character with "[*]".
> This should be isomorphic to the expected core. I parse and canonicalize it
> to get the expected core SMILES, which is:
>
>    [*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
> Note that the first has a '@@' in the second atom while the second has a
> '@'.
>
> ========== Reproducible which shows that the chirality changes ====
>
> from __future__ import print_function
> from rdkit import Chem
>
> # I want to cut the SMILES at the given atoms, to make a core and an
> R-group.
> # I want the location of the attachment points to be marked [*] atoms.
>
> #                                      1 1 1      1     1    1 1      1
> 11 2 2 2 2 2
> #         0 1     2 3     4     567  890 1 2      3     4    5 6      7
> 89 0 1 2 3 4
> smiles = "C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@
> ]1(CO)O)C)O)F)C"
> #    will cut the bond between 12 and 24
>  ^^^----------------------------------------^
> core_atom, rgroup_atom = 12, 24
>
> input_mol = Chem.MolFromSmiles(smiles)
> assert rgroup_atom == input_mol.GetNumAtoms()-1, "this needs to be the
> terminal 'C'"
>
> # Cut the bond and add two "[*]" atoms
> emol = Chem.EditableMol(input_mol)
> emol.RemoveBond(core_atom, rgroup_atom)
>
> for atom in (core_atom, rgroup_atom):
>     new_atom = emol.AddAtom(Chem.Atom(0))
>     new_bond = emol.AddBond(atom, new_atom, Chem.BondType.SINGLE)
>     print("added bond", new_bond, "from", atom, "to", new_atom)
>
> cut_mol = emol.GetMol()
> cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True)
>
> core_smiles = max(cut_smiles.split("."), key=len)
> print("found core SMILES:\n", core_smiles)
> # This prints:
> #   [*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
> # Compare it to what I expected.
> # In this case I can simply replace the terminal methyl in the SMILES.
> noncanonical_expected_core_smiles = smiles[:-1] + '[*]'
>
> expected_mol = Chem.MolFromSmiles(noncanonical_expected_core_smiles)
> expected_core_smiles = Chem.MolToSmiles(expected_mol, isomericSmiles=True)
> print("expected core SMILES:\n", expected_core_smiles)
> # This prints:
> #   [*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
>
> if core_smiles != expected_core_smiles:
>     print("!!!! Not the same !!!")
> else:
>     print("Identical")
>     raise AssertionError("did not expect that")
>
> ======
>
> Here is what I think is happening.
>
> The problem occurs because the atom's chiral tag is connected to the
> permutation order of its neighbors. I changed the parity of the permutation
> order without updating the chiral tag.
>
> I'll use the following bit of code to compute the parity of the neighbors
> for the input structure and compare it to the parity of the neighbors of
> the 'cut' structure.
>
> ======  Compute the parity for the configuration around the attachment
> atom on the core
> # This code snippet continues from the previous code
>
> def parity(x):
>     # Use a Shell sort to count the number of swaps.
>     # (This takes O(N*N) time, which is is okay for 4 neighbors.)
>     N = len(x)
>     num_swaps = 0
>     for i in range(N-1):
>         for j in range(i+1, N):
>             if x[i] > x[j]:
>                 x[i], x[j] = x[j], x[i]
>                 num_swaps += 1
>     return (-1) ** num_swaps # either +1 or -1 symmetry
>
> def get_bond_configuration(mol, atom):
>     # Return the parity of the atom indices of the neighbors
>     atom_obj = mol.GetAtomWithIdx(atom)
>     neighbors = [neighbor_atom_obj.GetIdx() for neighbor_atom_obj in
> atom_obj.GetNeighbors()]
>     assert len(neighbors) == 4, (atom, neighbors)
>     return parity(neighbors)
>
>
> input_configuration = get_bond_configuration(input_mol, 12)
> print("input configuration:", input_configuration)
>
> cut_configuration = get_bond_configuration(cut_mol, 12)
> print("cut configuration:", cut_configuration)
>
> assert input_configuration != cut_configuration, (
>     "I expected the configurations to be  different")
> ======
>
> This prints the output:
>
>   input configuration: -1
>   cut configuration: 1
>
> which demonstrates that the parity has reversed.
>
> I'll swap the chirality of the atom on the 'cut' molecule, generate the
> new core SMILES, and show that it matches the expected SMILES:
>
> ======== correct chirality and compare the SMILES with the expected value
> # This code snippet continues from the previous code
>
> if cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CW:
>     new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CCW
> elif cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CCW:
>     new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CW
> else:
>     raise AssertionError("unknown chiral tag", cut_chiral_tag)
>
> cut_atom_obj.SetChiralTag(new_chiral_tag)
>
>
> updated_cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True)
> updated_core_smiles = max(updated_cut_smiles.split("."), key=len)
> print("updated core SMILES:\n", updated_core_smiles)
>
> if updated_core_smiles == expected_core_smiles:
>     print("They match!")
> else:
>     raise AssertionError("They do not match")
> ========
>
> The result is the following:
>
>   updated core SMILES:
>    [*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>   They match!
>
> which tells me that the fix appears to have worked.
>
>
> ==== Proposal ===
>
> I propose a new API method which looks like :
>
>    edit_mol.ReplaceBond(from_atom, old_atom, new_atom)
>
> This will:
>   1) remember the bond type T connecting from_atom to old_atom
>   2) remove the bond from from_atom to old_atom
>   3) create a new bond of type T between from_atom and new_atom
>   4) update the chiral information so it's equivalent to having
>       new_atom in the same configuration position as old_atom.
>
>
> This assumes that my understanding of the problem is correct, that there
> is no existing/better way to do in RDKit, that it's useful enough that
> others may want it, and that someone else has the time to implement it.
>
> Cheers,
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to