Hi

Coming back to this thread as I have found a similar issue with rdkit 17-03/09.

BRICS.BreakBRICSBonds is inverting stereochemistry for some inputs.

>>> smi='CNc1ccccc1[C@H](C)NC'
>>> mol=Chem.MolFromSmiles(smi)
# we are using rdkit canonicalized smiles
>>> Chem.MolToSmiles(mol,1)
'CNc1ccccc1[C@H](C)NC'
>>> frags=BRICS.BRICSDecompose(mol,returnMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
# input is the first molecule in the list
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CNc1ccccc1[C@H](C)NC', 'CN[C@@H](C)c1ccccc1[C@H](C)NC', 
'CNc1ccccc1-c1ccccc1[C@H](C)NC', 'CNc1ccccc1NC', 'CNc1ccccc1-c1ccccc1NC', 
'CNc1ccccc1-c1ccccc1-c1ccccc1NC']
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
# input is the second in the list with inverted stereochem
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CNc1ccccc1NC', 'CNc1ccccc1[C@@H](C)NC', 'CNc1ccccc1-c1ccccc1NC', 
'CNc1ccccc1-c1ccccc1-c1ccccc1NC', 'CNc1ccccc1-c1ccccc1[C@@H](C)NC', 
'CN[C@H](C)c1ccccc1[C@@H](C)NC']

Interestingly, if I make a small change to the molecule
'COc1ccccc1[C@H](C)NC'
Using the smiles as written gives the same issue.

>>> mol=Chem.MolFromSmiles('COc1ccccc1[C@H](C)NC')
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
[…, 'CN[C@H](C)c1ccccc1OC', …]

However, this is not the RDKit canonical atom ordering for this molecule.
If I use the RDKit canonical smiles to build the molecule 
('CN[C@@H](C)c1ccccc1OC'), BreakBRICSBonds works fine and I can regenerate the 
initial molecule with BRICSBuild.

>>> mol=Chem.MolFromSmiles('CN[C@@H](C)c1ccccc1OC')
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
[…., 'CN[C@@H](C)c1ccccc1OC', ….]

Regards

Stephen

From: Stephen Pickett
Sent: 16 May 2017 09:01
To: Greg Landrum <greg.land...@gmail.com>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: RE: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

Thanks Greg

I’m hoping we can get to 17-03

Stephen

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 16 May 2017 06:22
To: Stephen Pickett
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation


EXTERNAL
Hi Stephen,

You're perfectly correct, what you're seeing there is a bug. However you're 
using a two-year old version of the RDKit and a number of bugs in this area 
have been fixed in the intervening time. Still, since there's potentially a lot 
going on here, and I'm always nervous about chirality, I will walk through the 
steps I took to figure out whether or not things work properly now for this 
case.

Let's start with making sure that the fragmentation work correctly:
In [18]: mol=Chem.MolFromSmiles('C1CCOC[C@H]1NC')

In [19]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [20]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[20]: ['[5*]NC', '[15*][C@H]1CCCOC1']

In [21]: mol=Chem.MolFromSmiles('C1CCOC[C@@H]1NC')

In [22]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [23]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[23]: ['[5*]NC', '[15*][C@@H]1CCCOC1']

Those both look ok, but we should try another input SMILES for the same 
molecule to make sure it's still ok:

In [24]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [25]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [26]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[26]: ['[5*]NC', '[15*][C@H]1CCCOC1']

Just to be really sure, let's reorder the bonds at the chiral center again, 
making sure to keep the same stereochemistry:
In [27]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [28]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [29]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[29]: ['[5*]NC', '[15*][C@H]1CCCOC1']

That also looks good, so we can have some reasonable confidence that 
BRICSDecompose() is doing the right thing.

BreakBRICSBonds() is used by BRICSDecompose(), so we'd expect that to work too:

In [31]: Chem.MolToSmiles(BRICS.BreakBRICSBonds(mol),isomericSmiles=True)
Out[31]: '[15*][C@H]1CCCOC1.[5*]NC'

and it does.

Now let's try putting molecules back together:

In [37]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [38]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [39]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in 
BRICS.BRICSBuild(frags)]
Out[39]: ['CN[C@H]1CCCOC1']

That looks ok, what about the other way of writing the SMILES?

In [40]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [41]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [42]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in 
BRICS.BRICSBuild(frags)]
Out[42]: ['CN[C@H]1CCCOC1']

Those also look ok; the bug that was in the older RDKit version has been fixed. 
I'd really suggest either updating to a newer version of the RDKit yourself or 
talking to your IT group and asking them to do the update. We can provide help 
on that here on the mailing list, or if you'd rather do it less publicly, 
commercial support is available for the RDKit, please contact me at 
greg.land...@t5informatics.com<mailto:greg.land...@t5informatics.com> to talk 
about that.

Best,
-greg






On Fri, May 12, 2017 at 10:37 AM, Stephen Pickett 
<stephen.d.pick...@gsk.com<mailto:stephen.d.pick...@gsk.com>> wrote:
Hi

I have come across a difference in behaviour with the BRICS algorithms 
depending on how the molecule is fragmented when using non-canonical smiles 
input.
RDKIT 2015_03, Python 2.7.10

BRICSDecompose gives back the starting chirality

>>> smi='C1CCOC[C@H]1NC'
>>> mol=Chem.MolFromSmiles(smi)
>>> cansmi=Chem.MolToSmiles(mol,1)
>>> cansmi
'CN[C@H]1CCCOC1'
>>> frags=BRICS.BRICSDecompose(mol,returnMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CN[C@H]1CCCOC1']

BreakBRICSBonds inverts the centre.
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CN[C@@H]1CCCOC1']

Starting from the canonical smiles works fine
>>> smi='CN[C@H]1CCCOC1'
>>> mol=Chem.MolFromSmiles(smi)
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CN[C@H]1CCCOC1']

The inversion happens in BreakBRICSBonds
>>> smi='C1CCOC[C@H]1NC'
>>> mol=Chem.MolFromSmiles(smi)
>>> Chem.MolToSmiles(BRICS.BreakBRICSBonds(mol),1)
'[15*][C@@H]1CCCOC1.[5*]NC'

Using the pre canonicalised SMILES is clearly the way to go, but thought that 
this might be indicative of an issue somewhere.

Regards

Stephen

________________________________

This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.

GSK monitors email communications sent to and from GSK in order to protect GSK, 
our employees, customers, suppliers and business partners, from cyber threats 
and loss of GSK Information. GSK monitoring is conducted with appropriate 
confidentiality controls and in accordance with local laws and after 
appropriate consultation.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


GSK monitors email communications sent to and from GSK in order to protect GSK, 
our employees, customers, suppliers and business partners, from cyber threats 
and loss of GSK Information. GSK monitoring is conducted with appropriate 
confidentiality controls and in accordance with local laws and after 
appropriate consultation.

________________________________

This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to