Re: [Rdkit-discuss] changes in chirality in rdkit?

2020-07-15 Thread Bennion, Brian via Rdkit-discuss
Ok Greg.  thank you for the sneak preview.

brian


From: Greg Landrum 
Sent: Tuesday, July 14, 2020 11:36 PM
To: Bennion, Brian 
Cc: Rafal Roszak ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] changes in chirality in rdkit?

Hi Brian,

I think you're misinterpreting the drawings. Those two images look like they 
correspond to the same molecule.

The easiest way to check things like this without having to interpret drawings 
is to use Chem.FindMolChiralCenters, which will show you absolute stereo labels 
for all stereoatoms:
In [2]: m1 = 
Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1')
In [3]: m2 = 
Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1')
In [4]: Chem.FindMolChiralCenters(m1)
Out[4]: [(2, 'R'), (4, 'S'), (16, 'R')]
In [5]: Chem.FindMolChiralCenters(m2)
Out[5]: [(2, 'R'), (5, 'S'), (9, 'R')]
Here's the substructure mapping between those molecules:
In [16]: match = m1.GetSubstructMatch(m2)
In [17]: match[2]
Out[17]: 2
In [18]: match[5]
Out[18]: 4
In [19]: match[9]
Out[19]: 16

The "R" and "S" labels that function produces are not necessarily correct 
according to IUPAC rules (though in this case they are), but they are 
consistently calculated.

As a preview: the 2020.09 RDKit release will include a new CIP calculator using 
the algorithm described in this paper 
https://pubs.acs.org/doi/10.1021/acs.jcim.8b00324. The new code, which I think 
will be quite helpful for people who need CIP labels, was implemented by 
Ricardo Rodriguez Schmidt at Schrodinger and derived from John Mayfield's java 
implementation (https://github.com/SiMolecule/centres). Here's what it says 
about your molecules:
In [6]: from rdkit.Chem import rdCIPLabeler
In [7]: rdCIPLabeler.AssignCIPLabels(m1)
In [8]: rdCIPLabeler.AssignCIPLabels(m2)
In [9]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m1.GetAtoms()) if 
x.HasProp('_CIPCode')]
Out[9]: [(2, 'R'), (4, 'S'), (16, 'R')]
In [10]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m2.GetAtoms()) if 
x.HasProp('_CIPCode')]
Out[10]: [(2, 'R'), (5, 'S'), (9, 'R')]

Best,
-greg

On Tue, Jul 14, 2020 at 6:36 PM Bennion, Brian via Rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
 wrote:

Hello Rafal,

Nice to see you on this forum.  I completely expect reordering of smiles 
strings from program to program.  This is why I like to convert them to images

The tetrahydrofuran with the methoxy group has inverted stereochemistry of its 
substituents.

The original string is shown in the first image.

[cid:173510855394cff311]



The second string after RDKit processing is shown in the second image here.

[cid:173510855395b16b22]

-Original Message-
From: Rafal Roszak mailto:rmrmg.c...@gmail.com>>
Sent: Tuesday, July 14, 2020 1:25 AM
To: Bennion, Brian mailto:benni...@llnl.gov>>
Cc: Bennion, Brian via Rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] changes in chirality in rdkit?



Hello Brain,





> The original smiles string

> "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"

>

> after conversion with rdkit

> "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"



After visualisation it seems to me that both smiles represent the same 
structure (stereochemistry is the same, just molecule orientation is diffrent). 
Canonical smiles from rdkit not allways is the same like canonical smiles from 
other programs. If you want to prevent atom order you can try use option 
canonical=False. See example below:



>>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC

>>> C2)CC1')

>>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC

>>> C2)O1')

>>> Chem.MolToSmiles(mol1)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

>>> Chem.MolToSmiles(mol2)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

#canonical smiles for both smiles are the same (above) but without 
canonicalisation you will get diffrent smiles:

>>> Chem.MolToSmiles(mol1, canonical=False)

'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1'

>>> Chem.MolToSmiles(mol2, canonical=False)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'





Best,



Rafal



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] changes in chirality in rdkit?

2020-07-15 Thread Greg Landrum
Hi Brian,

I think you're misinterpreting the drawings. Those two images look like
they correspond to the same molecule.

The easiest way to check things like this without having to interpret
drawings is to use Chem.FindMolChiralCenters, which will show you absolute
stereo labels for all stereoatoms:

In [2]: m1 = Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@
@H]2OCCC2)CC1')
In [3]: m2 = Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H
]3CCCO3)nnc2N22)O1')
In [4]: Chem.FindMolChiralCenters(m1)
Out[4]: [(2, 'R'), (4, 'S'), (16, 'R')]
In [5]: Chem.FindMolChiralCenters(m2)


Out[5]: [(2, 'R'), (5, 'S'), (9, 'R')]

Here's the substructure mapping between those molecules:

In [16]: match = m1.GetSubstructMatch(m2)
In [17]: match[2]
Out[17]: 2
In [18]: match[5]
Out[18]: 4
In [19]: match[9]
Out[19]: 16


The "R" and "S" labels that function produces are not necessarily correct
according to IUPAC rules (though in this case they are), but they are
consistently calculated.

As a preview: the 2020.09 RDKit release will include a new CIP calculator
using the algorithm described in this paper
https://pubs.acs.org/doi/10.1021/acs.jcim.8b00324. The new code, which I
think will be quite helpful for people who need CIP labels, was implemented
by Ricardo Rodriguez Schmidt at Schrodinger and derived from John
Mayfield's java implementation (https://github.com/SiMolecule/centres).
Here's what it says about your molecules:

In [6]: from rdkit.Chem import rdCIPLabeler
In [7]: rdCIPLabeler.AssignCIPLabels(m1)
In [8]: rdCIPLabeler.AssignCIPLabels(m2)
In [9]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m1.GetAtoms()) if
x.HasProp('_CIPCode')]
Out[9]: [(2, 'R'), (4, 'S'), (16, 'R')]
In [10]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m2.GetAtoms()) if
x.HasProp('_CIPCode')]
Out[10]: [(2, 'R'), (5, 'S'), (9, 'R')]


Best,
-greg

On Tue, Jul 14, 2020 at 6:36 PM Bennion, Brian via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello Rafal,
>
> Nice to see you on this forum.  I completely expect reordering of smiles
> strings from program to program.  This is why I like to convert them to
> images
>
> The tetrahydrofuran with the methoxy group has inverted stereochemistry of
> its substituents.
>
> The original string is shown in the first image.
>
>
>
> The second string after RDKit processing is shown in the second image here.
>
> -Original Message-
> From: Rafal Roszak 
> Sent: Tuesday, July 14, 2020 1:25 AM
> To: Bennion, Brian 
> Cc: Bennion, Brian via Rdkit-discuss 
> Subject: Re: [Rdkit-discuss] changes in chirality in rdkit?
>
>
>
> Hello Brain,
>
>
>
>
>
> > The original smiles string
>
> > "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"
>
> >
>
> > after conversion with rdkit
>
> > "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"
>
>
>
> After visualisation it seems to me that both smiles represent the same
> structure (stereochemistry is the same, just molecule orientation is
> diffrent). Canonical smiles from rdkit not allways is the same like
> canonical smiles from other programs. If you want to prevent atom order you
> can try use option canonical=False. See example below:
>
>
>
> >>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC
>
> >>> C2)CC1')
>
> >>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC
>
> >>> C2)O1')
>
> >>> Chem.MolToSmiles(mol1)
>
> 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'
>
> >>> Chem.MolToSmiles(mol2)
>
> 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'
>
> #canonical smiles for both smiles are the same (above) but without
> canonicalisation you will get diffrent smiles:
>
> >>> Chem.MolToSmiles(mol1, canonical=False)
>
> 'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1'
>
> >>> Chem.MolToSmiles(mol2, canonical=False)
>
> 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'
>
>
>
>
>
> Best,
>
>
>
> Rafal
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] changes in chirality in rdkit?

2020-07-14 Thread Bennion, Brian via Rdkit-discuss
Hello Rafal,

Nice to see you on this forum.  I completely expect reordering of smiles 
strings from program to program.  This is why I like to convert them to images

The tetrahydrofuran with the methoxy group has inverted stereochemistry of its 
substituents.

The original string is shown in the first image.

[cid:image001.png@01D659C1.FD411770]



The second string after RDKit processing is shown in the second image here.

[cid:image002.png@01D659C1.FD411770]

-Original Message-
From: Rafal Roszak 
Sent: Tuesday, July 14, 2020 1:25 AM
To: Bennion, Brian 
Cc: Bennion, Brian via Rdkit-discuss 
Subject: Re: [Rdkit-discuss] changes in chirality in rdkit?



Hello Brain,





> The original smiles string

> "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"

>

> after conversion with rdkit

> "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"



After visualisation it seems to me that both smiles represent the same 
structure (stereochemistry is the same, just molecule orientation is diffrent). 
Canonical smiles from rdkit not allways is the same like canonical smiles from 
other programs. If you want to prevent atom order you can try use option 
canonical=False. See example below:



>>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC

>>> C2)CC1')

>>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC

>>> C2)O1')

>>> Chem.MolToSmiles(mol1)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

>>> Chem.MolToSmiles(mol2)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

#canonical smiles for both smiles are the same (above) but without 
canonicalisation you will get diffrent smiles:

>>> Chem.MolToSmiles(mol1, canonical=False)

'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1'

>>> Chem.MolToSmiles(mol2, canonical=False)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'





Best,



Rafal


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] changes in chirality in rdkit?

2020-07-14 Thread Rafal Roszak
Hello Brain,


> The original smiles string
> "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"
> 
> after conversion with rdkit
> "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"

After visualisation it seems to me that both smiles represent the same
structure (stereochemistry is the same, just molecule orientation is
diffrent). Canonical smiles from rdkit not allways is the same like
canonical smiles from other programs. If you want to prevent atom order
you can try use option canonical=False. See example below:

>>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1')
>>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1')
>>> Chem.MolToSmiles(mol1)
'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'
>>> Chem.MolToSmiles(mol2)
'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'
#canonical smiles for both smiles are the same (above) but without 
canonicalisation you will get diffrent smiles:
>>> Chem.MolToSmiles(mol1, canonical=False)
'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1'
>>> Chem.MolToSmiles(mol2, canonical=False)
'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'


Best,

Rafal



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] changes in chirality in rdkit?

2020-07-13 Thread Bennion, Brian via Rdkit-discuss
hello,
I am "translating" smiles strings output in a csv file  from another program 
into RDKit canonical strings with this code.
If there is something that I am doing incorrectly I would appreciate the input.
thanks
brian bennion


The original  smiles string

"OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"


after conversion with rdkit

"OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"

my code is below.

   protn_pat = re.compile(r'\[([IBnN])\+(@*)(H[1234]*)*\]')

   line = inFile.readline()
   while len(line) != 0:
fields = line.replace('","',' ').split()
mol_name = fields[2]
molMOE = fields[3].replace('"','')
mol1check = protn_pat.search(molMOE)
if mol1check is not None:
   print("Found crazy MOE string",mol1check,molMOE)
   mol1 = protn_pat.sub(r'[\1\3\2+]',molMOE)
else:
   mol1 = molMOE
try:
mol = Chem.MolFromSmiles(mol1)
except:
mol = None
if mol is None:
print('mol failed:'+molMOE+' '+mol1+' '+str(count)+'\n')

else:
rdkitsmichiout.write('\"'+Chem.MolToSmiles(mol, 
isomericSmiles=True)+'\",')

rdkitsmichiout.write('\"'+Chem.inchi.MolToInchi(mol,options='/FixedH')+'\",')

rdkitsmichiout.write('\"'+(Chem.inchi.InchiToInchiKey(Chem.inchi.MolToInchi(mol,options='/FixedH')))+'\"\n')

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss