Thanks Greg!

I found an alternative solution which is also no so straightforward. I set an isotope label to aromatic atoms, generate isomeric SMILES and make regex replacement.

But your suggestion to set remove hydrogens is important, since this can cause other ambiguity.


import re

m = RWMol()

for i in range(3):
    a = Atom(6)
    a.SetNoImplicit(True)  # remove implicit Hs
    m.AddAtom(a)
a = Atom(0)
m.AddAtom(a)

m.GetAtomWithIdx(0).SetIsAromatic(True)  # set aromatic
m.GetAtomWithIdx(0).SetIsotope(42)       # set isotope

m.GetAtomWithIdx(3).SetAtomMapNum(1)

m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

s = Chem.MolToSmiles(m, isomericSmiles=True)

re.sub('\[[0-9]+([a-z]+)H?[0-9]?\]', '\\1', s) # remove isotope in output SMILES

OUTPUT: 'CC(c)[*:1]'

Pavel.




On 08/02/2017 06:24 AM, Greg Landrum wrote:
Hi Pavel,

It is, unfortunately, not that easy.
The canonicalization algorithm does not use atomic aromaticity when determining atom ordering, so as far as it is concerned there is no difference between atoms 0 and 2 in either of your examples. What does get used is the number of hydrogens, so you need to use that in order to get the results you are looking for.[1] For technical reasons, you also need to tell the RDKit that the atoms should not have implicit Hs attached. Here's a gist that works for me: https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843

Two notes:
1) I don't set the number of Hs on atom 1 in that gist, but I would suggest doing that too. 2) If atoms 0 and 2 have the same number of Hs attached, this still is not going to work if you're building things from fragments. The canonicalization code was not really designed to be used in situations like this.

-greg
[1] The details of the canonicalization algorithm, including the contents of the atom invariants, are described here: http://dx.doi.org/10.1021/acs.jcim.5b00543


On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchuk <pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote:

    Hi all,

      canonicalization of fragment SMILES does not work properly.
    Below there are two examples of identical fragments. The only
    difference is the order of atoms (indices). However, it seems that
    RDKit canonicalization does not take into account atom types.

      Does someone have an idea how to solve this issue with small losses?

    #1 ===========

    m = RWMol()

    for i in range(3):
        a = Atom(6)
        m.AddAtom(a)
    a = Atom(0)
    m.AddAtom(a)

    m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
    m.GetAtomWithIdx(3).SetAtomMapNum(1)


    m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
    m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
    m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

    Chem.MolToSmiles(m)

    OUTPUT: 'cC(C)[*:1]'

    #2 ===========

    m2 = RWMol()

    for i in range(3):
        a = Atom(6)
        m2.AddAtom(a)
    a = Atom(0)
    m2.AddAtom(a)

    m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
    m2.GetAtomWithIdx(3).SetAtomMapNum(1)


    m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
    m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
    m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

    Chem.MolToSmiles(m2)

    OUTPUT: 'CC(c)[*:1]'


    Pavel.

    
------------------------------------------------------------------------------
    Check out the vibrant tech community on one of the world's most
    engaging tech sites, Slashdot.org! http://sdm.link/slashdot
    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net
    <mailto:Rdkit-discuss@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to