Thank you, Brian!

Actually what I expected as output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
and so on

You gave me the right direction. I can store old-new maps in a dict and after relabeling and producing of canonical smiles it would be easy to relabel attachment points back.
Thank you again!

Pavel.

On 05/27/2017 03:03 PM, Brian Kelley wrote:
Pavel, this isn't exactly trivial so I went ahead and made an example. The basics are that atomMaps are canonicalized, i.e. their value is used in the generation of smiles.

To solve this problem:
1) backup the atom maps and remove them
2) canonicalize *without* atom maps but figure out the order in which the atoms in the molecule are output 3) using the atom output order, relabel the atom maps based on output order.

That's a mouthful, but here's some code that should do the trick:

from rdkit import Chem

smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
       "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
       "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
       "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
       "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
       "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]


def CanonicalizeMaps(m, *a, **kw):
    # atom maps are canonicalized, so rename them
    #  figure out where they would have gone
    #  and relabel from 1...N based on output order
    atomMap = "molAtomMapNumber"
    backupAtomMap = "oldMolAtomMapNumber"
    for atom in m.GetAtoms():
        if atom.HasProp(atomMap):
            atomNum = atom.GetProp(atomMap)
            atom.SetProp(backupAtomMap, atomNum)
            atom.ClearProp(atomMap)

    # canonicalize
    smi = Chem.MolToSmiles(m, *a, **kw)
    # where did the atoms end up in the output string?
    atoms = [(pos, atom_idx) for atom_idx, pos in enumerate(
        eval(m.GetProp("_smilesAtomOutputOrder")))]
    atommap = 1
    atoms.sort()

    # set the new atommap based on output position
    for pos, atom_idx in atoms:
        atom = m.GetAtomWithIdx(atom_idx)
        if atom.HasProp(backupAtomMap):
            atom.SetProp(atomMap, str(atommap))
            atommap +=1
    return Chem.MolToSmiles(m)
for s in smi:
    m = Chem.MolFromSmiles(s)
    print CanonicalizeMaps(m,True)



Output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]

Now, if you want the atomMaps in 1...2...3 output order, we could do that as well, but it is even trickier.

Enjoy,
 Brian

On Sat, May 27, 2017 at 8:36 AM, Pavel Polishchuk <pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote:

    Hi,

      I cannot solve an issue and would like to ask for an advice.
      If there are different map numbers for attachment points for the
    same fragment different canonical smiles are generated.
      I observed such behavior only for fragments with 3 attachment
    points. Below is an example.
      I'm looking for a solution/workaround how to produce the "same"
    smiles strings irrespectively of mapping that after removal of map
    numbers smiles will become identical.
      Any advice would be appreciated.

    smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
           "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
           "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
           "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
           "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
           "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]

    for s in smi:
        print(Chem.MolToSmiles(Chem.MolFromSmiles(s)))

    output:
    S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
    S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
    S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2]
    S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
    S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3]
    S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3]

    Kind regards,
    Pavel.

    
------------------------------------------------------------------------------
    Check out the vibrant tech community on one of the world's most
    engaging tech sites, Slashdot.org! http://sdm.link/slashdot
    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net
    <mailto:Rdkit-discuss@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to