Re: [Rdkit-discuss] canonical SMILES of a fragment

2017-08-02 Thread Pavel Polishchuk

Thanks Greg!

  I found an alternative solution which is also no so straightforward. 
I set an isotope label to aromatic atoms, generate isomeric SMILES and 
make regex replacement.


  But your suggestion to set remove hydrogens is important, since this 
can cause other ambiguity.



import re

m = RWMol()

for i in range(3):
a = Atom(6)
a.SetNoImplicit(True)  # remove implicit Hs
m.AddAtom(a)
a = Atom(0)
m.AddAtom(a)

m.GetAtomWithIdx(0).SetIsAromatic(True)  # set aromatic
m.GetAtomWithIdx(0).SetIsotope(42)   # set isotope

m.GetAtomWithIdx(3).SetAtomMapNum(1)

m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

s = Chem.MolToSmiles(m, isomericSmiles=True)

re.sub('\[[0-9]+([a-z]+)H?[0-9]?\]', '\\1', s)  # remove isotope in 
output SMILES


OUTPUT: 'CC(c)[*:1]'

Pavel.




On 08/02/2017 06:24 AM, Greg Landrum wrote:

Hi Pavel,

It is, unfortunately, not that easy.
The canonicalization algorithm does not use atomic aromaticity when 
determining atom ordering, so as far as it is concerned there is no 
difference between atoms 0 and 2 in either of your examples. What does 
get used is the number of hydrogens, so you need to use that in order 
to get the results you are looking for.[1] For technical reasons, you 
also need to tell the RDKit that the atoms should not have implicit Hs 
attached. Here's a gist that works for me: 
https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843


Two notes:
 1) I don't set the number of Hs on atom 1 in that gist, but I would 
suggest doing that too.
 2) If atoms 0 and 2 have the same number of Hs attached, this still 
is not going to work if you're building things from fragments. The 
canonicalization code was not really designed to be used in situations 
like this.


-greg
[1] The details of the canonicalization algorithm, including the 
contents of the atom invariants, are described here: 
http://dx.doi.org/10.1021/acs.jcim.5b00543



On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchuk 
> wrote:


Hi all,

  canonicalization of fragment SMILES does not work properly.
Below there are two examples of identical fragments. The only
difference is the order of atoms (indices). However, it seems that
RDKit canonicalization does not take into account atom types.

  Does someone have an idea how to solve this issue with small losses?

#1 ===

m = RWMol()

for i in range(3):
a = Atom(6)
m.AddAtom(a)
a = Atom(0)
m.AddAtom(a)

m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
m.GetAtomWithIdx(3).SetAtomMapNum(1)


m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m)

OUTPUT: 'cC(C)[*:1]'

#2 ===

m2 = RWMol()

for i in range(3):
a = Atom(6)
m2.AddAtom(a)
a = Atom(0)
m2.AddAtom(a)

m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
m2.GetAtomWithIdx(3).SetAtomMapNum(1)


m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m2)

OUTPUT: 'CC(c)[*:1]'


Pavel.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical SMILES of a fragment

2017-08-01 Thread Greg Landrum
Hi Pavel,

It is, unfortunately, not that easy.
The canonicalization algorithm does not use atomic aromaticity when
determining atom ordering, so as far as it is concerned there is no
difference between atoms 0 and 2 in either of your examples. What does get
used is the number of hydrogens, so you need to use that in order to get
the results you are looking for.[1] For technical reasons, you also need to
tell the RDKit that the atoms should not have implicit Hs attached. Here's
a gist that works for me:
https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843

Two notes:
 1) I don't set the number of Hs on atom 1 in that gist, but I would
suggest doing that too.
 2) If atoms 0 and 2 have the same number of Hs attached, this still is not
going to work if you're building things from fragments. The
canonicalization code was not really designed to be used in situations like
this.

-greg
[1] The details of the canonicalization algorithm, including the contents
of the atom invariants, are described here:
http://dx.doi.org/10.1021/acs.jcim.5b00543


On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchuk 
wrote:

> Hi all,
>
>   canonicalization of fragment SMILES does not work properly. Below there
> are two examples of identical fragments. The only difference is the order
> of atoms (indices). However, it seems that RDKit canonicalization does not
> take into account atom types.
>
>   Does someone have an idea how to solve this issue with small losses?
>
> #1 ===
>
> m = RWMol()
>
> for i in range(3):
> a = Atom(6)
> m.AddAtom(a)
> a = Atom(0)
> m.AddAtom(a)
>
> m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
> m.GetAtomWithIdx(3).SetAtomMapNum(1)
>
>
> m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
> m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
> m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)
>
> Chem.MolToSmiles(m)
>
> OUTPUT: 'cC(C)[*:1]'
>
> #2 ===
>
> m2 = RWMol()
>
> for i in range(3):
> a = Atom(6)
> m2.AddAtom(a)
> a = Atom(0)
> m2.AddAtom(a)
>
> m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
> m2.GetAtomWithIdx(3).SetAtomMapNum(1)
>
>
> m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
> m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
> m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)
>
> Chem.MolToSmiles(m2)
>
> OUTPUT: 'CC(c)[*:1]'
>
>
> Pavel.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss