Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

2018-02-07 Thread Greg Landrum
On Wed, Feb 7, 2018 at 4:36 PM, Stephen Pickett 
wrote:

>
>
> Thanks for taking a look.
>
>
If you want to keep an eye on what's going on, here's the bug:
https://github.com/rdkit/rdkit/issues/1734


> FYI, I hope to include a section about how we are using this algorithm at
> the UK QSAR meeting in Cardiff in April.
>
>
It should all work as long as you stick to the reactions...

It would be great if you could share the slides when you've got that
presentation put together!

-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

2018-02-07 Thread Stephen Pickett
Hi Greg

Thanks for taking a look.
FYI, I hope to include a section about how we are using this algorithm at the 
UK QSAR meeting in Cardiff in April.

Stephen

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 07 February 2018 15:27
To: Stephen Pickett <stephen.d.pick...@gsk.com>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation


EXTERNAL
It's no fair reviving old items on difficult topics like stereochemistry! ;-)

This is due to a bug in BRICS.BreakBRICSBonds(): stereochemistry isn't handled 
correctly.
I have to admit that I'm surprised by this: I expected that this code would 
behave properly, but it clearly doesn't. That's a bug for me to look into.

Your other approach, using BRICS.BRICSDecompose(), uses a different the 
ChemicalReaction machinery to fragment the molecules. This does a better job of 
handling stereochemistry.

Thanks for pointing this out and sorry for the quite-delayed reply.

-greg
p.s. in my reply when this thread originally came up I said that 
BRICSDecompose() uses BreakBRICSBonds(), this is incorrect... I wrote that 
email too quickly.



On Wed, Jan 10, 2018 at 3:15 PM, Stephen Pickett 
<stephen.d.pick...@gsk.com<mailto:stephen.d.pick...@gsk.com>> wrote:
Hi

Coming back to this thread as I have found a similar issue with rdkit 17-03/09.

BRICS.BreakBRICSBonds is inverting stereochemistry for some inputs.

>>> smi='CNc1c1[C@H](C)NC'
>>> mol=Chem.MolFromSmiles(smi)
# we are using rdkit canonicalized smiles
>>> Chem.MolToSmiles(mol,1)
'CNc1c1[C@H](C)NC'
>>> frags=BRICS.BRICSDecompose(mol,returnMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
# input is the first molecule in the list
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CNc1c1[C@H](C)NC', 'CN[C@@H](C)c1c1[C@H](C)NC', 
'CNc1c1-c1c1[C@H](C)NC', 'CNc1c1NC', 'CNc1c1-c1c1NC', 
'CNc1c1-c1c1-c1c1NC']
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
# input is the second in the list with inverted stereochem
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CNc1c1NC', 'CNc1c1[C@@H](C)NC', 'CNc1c1-c1c1NC', 
'CNc1c1-c1c1-c1c1NC', 'CNc1c1-c1c1[C@@H](C)NC', 
'CN[C@H](C)c1c1[C@@H](C)NC']

Interestingly, if I make a small change to the molecule
'COc1c1[C@H](C)NC'
Using the smiles as written gives the same issue.

>>> mol=Chem.MolFromSmiles('COc1c1[C@H](C)NC')
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
[…, 'CN[C@H](C)c1c1OC', …]

However, this is not the RDKit canonical atom ordering for this molecule.
If I use the RDKit canonical smiles to build the molecule 
('CN[C@@H](C)c1c1OC'), BreakBRICSBonds works fine and I can regenerate the 
initial molecule with BRICSBuild.

>>> mol=Chem.MolFromSmiles('CN[C@@H](C)c1c1OC')
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
[…., 'CN[C@@H](C)c1c1OC', ….]

Regards

Stephen

From: Stephen Pickett
Sent: 16 May 2017 09:01
To: Greg Landrum <greg.land...@gmail.com<mailto:greg.land...@gmail.com>>
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: RE: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

Thanks Greg

I’m hoping we can get to 17-03

Stephen

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 16 May 2017 06:22
To: Stephen Pickett
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation


EXTERNAL
Hi Stephen,

You're perfectly correct, what you're seeing there is a bug. However you're 
using a two-year old version of the RDKit and a number of bugs in this area 
have been fixed in the intervening time. Still, since there's potentially a lot 
going on here, and I'm always nervous about chirality, I will walk through the 
steps I took to figure out whether or not things work properly now for this 
case.

Let's start with making sure that the fragmentation work correctly:
In [18]: mol=Chem.MolFromSmiles('C1CCOC[C@H]1NC')

In [19]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [20]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[20]: ['[5*]NC', '[15*][C@H]1CCCOC1']

In [21]: mol=Chem.MolFromSmiles('C1CCOC[C@@H]1NC')

In [22]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [23]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[23]: ['[5*]NC', '[15*][C@@H]1CCCOC1']

Those both look ok, but we should try another input SMILES for the same 
molecule to make sure it's still ok:

In [24]: m

Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

2018-02-07 Thread Greg Landrum
It's no fair reviving old items on difficult topics like stereochemistry!
;-)

This is due to a bug in BRICS.BreakBRICSBonds(): stereochemistry isn't
handled correctly.
I have to admit that I'm surprised by this: I expected that this code would
behave properly, but it clearly doesn't. That's a bug for me to look into.

Your other approach, using BRICS.BRICSDecompose(), uses a different the
ChemicalReaction machinery to fragment the molecules. This does a better
job of handling stereochemistry.

Thanks for pointing this out and sorry for the quite-delayed reply.

-greg
p.s. in my reply when this thread originally came up I said that
BRICSDecompose() uses BreakBRICSBonds(), this is incorrect... I wrote that
email too quickly.



On Wed, Jan 10, 2018 at 3:15 PM, Stephen Pickett <stephen.d.pick...@gsk.com>
wrote:

> Hi
>
>
>
> Coming back to this thread as I have found a similar issue with rdkit
> 17-03/09.
>
>
>
> BRICS.BreakBRICSBonds is inverting stereochemistry for some inputs.
>
>
>
> >>> smi='CNc1c1[C@H](C)NC'
>
> >>> mol=Chem.MolFromSmiles(smi)
>
> # we are using rdkit canonicalized smiles
>
> >>> Chem.MolToSmiles(mol,1)
>
> 'CNc1c1[C@H](C)NC'
>
> >>> frags=BRICS.BRICSDecompose(mol,returnMols=True)
>
> >>> bm=list(BRICS.BRICSBuild(frags))
>
> # input is the first molecule in the list
>
> >>> [Chem.MolToSmiles(m,1) for m in bm]
>
> ['CNc1c1[C@H](C)NC', 'CN[C@@H](C)c1c1[C@H](C)NC',
> 'CNc1c1-c1c1[C@H](C)NC', 'CNc1c1NC', 'CNc1c1-c1c1NC',
> 'CNc1c1-c1c1-c1c1NC']
>
> >>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>
> >>> bm=list(BRICS.BRICSBuild(frags))
>
> # input is the second in the list with inverted stereochem
>
> >>> [Chem.MolToSmiles(m,1) for m in bm]
>
> ['CNc1c1NC', 'CNc1c1[C@@H](C)NC', 'CNc1c1-c1c1NC',
> 'CNc1c1-c1c1-c1c1NC', 'CNc1c1-c1c1[C@@H](C)NC',
> 'CN[C@H](C)c1c1[C@@H](C)NC']
>
>
>
> Interestingly, if I make a small change to the molecule
>
> 'COc1c1[C@H](C)NC'
>
> Using the smiles as written gives the same issue.
>
>
>
> >>> mol=Chem.MolFromSmiles('COc1c1[C@H](C)NC')
>
> >>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>
> >>> bm=list(BRICS.BRICSBuild(frags))
>
> >>> [Chem.MolToSmiles(m,1) for m in bm]
>
> […, 'CN[C@H](C)c1c1OC', …]
>
>
>
> However, this is not the RDKit canonical atom ordering for this molecule.
>
> If I use the RDKit canonical smiles to build the molecule 
> ('CN[C@@H](C)c1c1OC'),
> BreakBRICSBonds works fine and I can regenerate the initial molecule with
> BRICSBuild.
>
>
>
> >>> mol=Chem.MolFromSmiles('CN[C@@H](C)c1c1OC')
>
> >>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>
> >>> bm=list(BRICS.BRICSBuild(frags))
>
> >>> [Chem.MolToSmiles(m,1) for m in bm]
>
> […., 'CN[C@@H](C)c1c1OC', ….]
>
>
>
> Regards
>
>
>
> Stephen
>
>
>
> *From:* Stephen Pickett
> *Sent:* 16 May 2017 09:01
> *To:* Greg Landrum <greg.land...@gmail.com>
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* RE: [Rdkit-discuss] Differences in chirality with BRICS
> fragmentation
>
>
>
> Thanks Greg
>
>
>
> I’m hoping we can get to 17-03
>
>
>
> Stephen
>
>
>
> *From:* Greg Landrum [mailto:greg.land...@gmail.com
> <greg.land...@gmail.com>]
> *Sent:* 16 May 2017 06:22
> *To:* Stephen Pickett
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] Differences in chirality with BRICS
> fragmentation
>
>
>
> *EXTERNAL*
>
> Hi Stephen,
>
>
>
> You're perfectly correct, what you're seeing there is a bug. However
> you're using a two-year old version of the RDKit and a number of bugs in
> this area have been fixed in the intervening time. Still, since there's
> potentially a lot going on here, and I'm always nervous about chirality, I
> will walk through the steps I took to figure out whether or not things work
> properly now for this case.
>
>
>
> Let's start with making sure that the fragmentation work correctly:
>
> In [18]: mol=Chem.MolFromSmiles('C1CCOC[C@H]1NC')
>
>
>
> In [19]: frags=BRICS.BRICSDecompose(mol,returnMols=True)
>
>
>
> In [20]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
>
> Out[20]: ['[5*]NC', '[15*][C@H]1CCCOC1']
>
>
>
> In [21]: mol=Chem.MolFromSmiles('C1CCOC[C@@H]1NC')
>
>
>
> In [22]: frags=BRICS.BRICSDecompose(mol,returnMo

Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

2018-01-10 Thread Stephen Pickett
Hi

Coming back to this thread as I have found a similar issue with rdkit 17-03/09.

BRICS.BreakBRICSBonds is inverting stereochemistry for some inputs.

>>> smi='CNc1c1[C@H](C)NC'
>>> mol=Chem.MolFromSmiles(smi)
# we are using rdkit canonicalized smiles
>>> Chem.MolToSmiles(mol,1)
'CNc1c1[C@H](C)NC'
>>> frags=BRICS.BRICSDecompose(mol,returnMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
# input is the first molecule in the list
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CNc1c1[C@H](C)NC', 'CN[C@@H](C)c1c1[C@H](C)NC', 
'CNc1c1-c1c1[C@H](C)NC', 'CNc1c1NC', 'CNc1c1-c1c1NC', 
'CNc1c1-c1c1-c1c1NC']
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
# input is the second in the list with inverted stereochem
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CNc1c1NC', 'CNc1c1[C@@H](C)NC', 'CNc1c1-c1c1NC', 
'CNc1c1-c1c1-c1c1NC', 'CNc1c1-c1c1[C@@H](C)NC', 
'CN[C@H](C)c1c1[C@@H](C)NC']

Interestingly, if I make a small change to the molecule
'COc1c1[C@H](C)NC'
Using the smiles as written gives the same issue.

>>> mol=Chem.MolFromSmiles('COc1c1[C@H](C)NC')
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
[…, 'CN[C@H](C)c1c1OC', …]

However, this is not the RDKit canonical atom ordering for this molecule.
If I use the RDKit canonical smiles to build the molecule 
('CN[C@@H](C)c1c1OC'), BreakBRICSBonds works fine and I can regenerate the 
initial molecule with BRICSBuild.

>>> mol=Chem.MolFromSmiles('CN[C@@H](C)c1c1OC')
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
[…., 'CN[C@@H](C)c1c1OC', ….]

Regards

Stephen

From: Stephen Pickett
Sent: 16 May 2017 09:01
To: Greg Landrum <greg.land...@gmail.com>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: RE: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

Thanks Greg

I’m hoping we can get to 17-03

Stephen

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 16 May 2017 06:22
To: Stephen Pickett
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation


EXTERNAL
Hi Stephen,

You're perfectly correct, what you're seeing there is a bug. However you're 
using a two-year old version of the RDKit and a number of bugs in this area 
have been fixed in the intervening time. Still, since there's potentially a lot 
going on here, and I'm always nervous about chirality, I will walk through the 
steps I took to figure out whether or not things work properly now for this 
case.

Let's start with making sure that the fragmentation work correctly:
In [18]: mol=Chem.MolFromSmiles('C1CCOC[C@H]1NC')

In [19]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [20]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[20]: ['[5*]NC', '[15*][C@H]1CCCOC1']

In [21]: mol=Chem.MolFromSmiles('C1CCOC[C@@H]1NC')

In [22]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [23]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[23]: ['[5*]NC', '[15*][C@@H]1CCCOC1']

Those both look ok, but we should try another input SMILES for the same 
molecule to make sure it's still ok:

In [24]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [25]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [26]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[26]: ['[5*]NC', '[15*][C@H]1CCCOC1']

Just to be really sure, let's reorder the bonds at the chiral center again, 
making sure to keep the same stereochemistry:
In [27]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [28]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [29]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[29]: ['[5*]NC', '[15*][C@H]1CCCOC1']

That also looks good, so we can have some reasonable confidence that 
BRICSDecompose() is doing the right thing.

BreakBRICSBonds() is used by BRICSDecompose(), so we'd expect that to work too:

In [31]: Chem.MolToSmiles(BRICS.BreakBRICSBonds(mol),isomericSmiles=True)
Out[31]: '[15*][C@H]1CCCOC1.[5*]NC'

and it does.

Now let's try putting molecules back together:

In [37]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [38]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [39]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in 
BRICS.BRICSBuild(frags)]
Out[39]: ['CN[C@H]1CCCOC1']

That looks ok, what about the other way of writing the SMILES?

In [40]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [41]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [42]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in 
BRIC

Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

2017-05-16 Thread Stephen Pickett
Thanks Greg

I’m hoping we can get to 17-03

Stephen

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 16 May 2017 06:22
To: Stephen Pickett
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation


EXTERNAL
Hi Stephen,

You're perfectly correct, what you're seeing there is a bug. However you're 
using a two-year old version of the RDKit and a number of bugs in this area 
have been fixed in the intervening time. Still, since there's potentially a lot 
going on here, and I'm always nervous about chirality, I will walk through the 
steps I took to figure out whether or not things work properly now for this 
case.

Let's start with making sure that the fragmentation work correctly:
In [18]: mol=Chem.MolFromSmiles('C1CCOC[C@H]1NC')

In [19]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [20]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[20]: ['[5*]NC', '[15*][C@H]1CCCOC1']

In [21]: mol=Chem.MolFromSmiles('C1CCOC[C@@H]1NC')

In [22]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [23]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[23]: ['[5*]NC', '[15*][C@@H]1CCCOC1']

Those both look ok, but we should try another input SMILES for the same 
molecule to make sure it's still ok:

In [24]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [25]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [26]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[26]: ['[5*]NC', '[15*][C@H]1CCCOC1']

Just to be really sure, let's reorder the bonds at the chiral center again, 
making sure to keep the same stereochemistry:
In [27]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [28]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [29]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[29]: ['[5*]NC', '[15*][C@H]1CCCOC1']

That also looks good, so we can have some reasonable confidence that 
BRICSDecompose() is doing the right thing.

BreakBRICSBonds() is used by BRICSDecompose(), so we'd expect that to work too:

In [31]: Chem.MolToSmiles(BRICS.BreakBRICSBonds(mol),isomericSmiles=True)
Out[31]: '[15*][C@H]1CCCOC1.[5*]NC'

and it does.

Now let's try putting molecules back together:

In [37]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [38]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [39]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in 
BRICS.BRICSBuild(frags)]
Out[39]: ['CN[C@H]1CCCOC1']

That looks ok, what about the other way of writing the SMILES?

In [40]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [41]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [42]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in 
BRICS.BRICSBuild(frags)]
Out[42]: ['CN[C@H]1CCCOC1']

Those also look ok; the bug that was in the older RDKit version has been fixed. 
I'd really suggest either updating to a newer version of the RDKit yourself or 
talking to your IT group and asking them to do the update. We can provide help 
on that here on the mailing list, or if you'd rather do it less publicly, 
commercial support is available for the RDKit, please contact me at 
greg.land...@t5informatics.com<mailto:greg.land...@t5informatics.com> to talk 
about that.

Best,
-greg






On Fri, May 12, 2017 at 10:37 AM, Stephen Pickett 
<stephen.d.pick...@gsk.com<mailto:stephen.d.pick...@gsk.com>> wrote:
Hi

I have come across a difference in behaviour with the BRICS algorithms 
depending on how the molecule is fragmented when using non-canonical smiles 
input.
RDKIT 2015_03, Python 2.7.10

BRICSDecompose gives back the starting chirality

>>> smi='C1CCOC[C@H]1NC'
>>> mol=Chem.MolFromSmiles(smi)
>>> cansmi=Chem.MolToSmiles(mol,1)
>>> cansmi
'CN[C@H]1CCCOC1'
>>> frags=BRICS.BRICSDecompose(mol,returnMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CN[C@H]1CCCOC1']

BreakBRICSBonds inverts the centre.
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CN[C@@H]1CCCOC1']

Starting from the canonical smiles works fine
>>> smi='CN[C@H]1CCCOC1'
>>> mol=Chem.MolFromSmiles(smi)
>>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>>> bm=list(BRICS.BRICSBuild(frags))
>>> [Chem.MolToSmiles(m,1) for m in bm]
['CN[C@H]1CCCOC1']

The inversion happens in BreakBRICSBonds
>>> smi='C1CCOC[C@H]1NC'
>>> mol=Chem.MolFromSmiles(smi)
>>> Chem.MolToSmiles(BRICS.BreakBRICSBonds(mol),1)
'[15*][C@@H]1CCCOC1.[5*]NC'

Using the pre canonicalised SMILES is clearly the way to go, but thought that 
this might be indicative of an issue somewhere.

Regards

Stephen



This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
memb

Re: [Rdkit-discuss] Differences in chirality with BRICS fragmentation

2017-05-15 Thread Greg Landrum
Hi Stephen,

You're perfectly correct, what you're seeing there is a bug. However you're
using a two-year old version of the RDKit and a number of bugs in this area
have been fixed in the intervening time. Still, since there's potentially a
lot going on here, and I'm always nervous about chirality, I will walk
through the steps I took to figure out whether or not things work properly
now for this case.

Let's start with making sure that the fragmentation work correctly:

In [18]: mol=Chem.MolFromSmiles('C1CCOC[C@H]1NC')

In [19]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [20]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[20]: ['[5*]NC', '[15*][C@H]1CCCOC1']

In [21]: mol=Chem.MolFromSmiles('C1CCOC[C@@H]1NC')

In [22]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [23]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[23]: ['[5*]NC', '[15*][C@@H]1CCCOC1']


Those both look ok, but we should try another input SMILES for the same
molecule to make sure it's still ok:

In [24]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [25]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [26]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[26]: ['[5*]NC', '[15*][C@H]1CCCOC1']


Just to be really sure, let's reorder the bonds at the chiral center again,
making sure to keep the same stereochemistry:

In [27]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [28]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [29]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in frags]
Out[29]: ['[5*]NC', '[15*][C@H]1CCCOC1']


That also looks good, so we can have some reasonable confidence that
BRICSDecompose() is doing the right thing.

BreakBRICSBonds() is used by BRICSDecompose(), so we'd expect that to work
too:


In [31]: Chem.MolToSmiles(BRICS.BreakBRICSBonds(mol),isomericSmiles=True)
Out[31]: '[15*][C@H]1CCCOC1.[5*]NC'


and it does.

Now let's try putting molecules back together:

In [37]: mol=Chem.MolFromSmiles('CN[C@H]1CCCOC1')

In [38]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [39]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in
BRICS.BRICSBuild(frags)]
Out[39]: ['CN[C@H]1CCCOC1']


That looks ok, what about the other way of writing the SMILES?

In [40]: mol=Chem.MolFromSmiles('CN[C@@H]1COCCC1')

In [41]: frags=BRICS.BRICSDecompose(mol,returnMols=True)

In [42]: [Chem.MolToSmiles(x,isomericSmiles=True) for x in
BRICS.BRICSBuild(frags)]
Out[42]: ['CN[C@H]1CCCOC1']


Those also look ok; the bug that was in the older RDKit version has been
fixed. I'd really suggest either updating to a newer version of the RDKit
yourself or talking to your IT group and asking them to do the update. We
can provide help on that here on the mailing list, or if you'd rather do it
less publicly, commercial support is available for the RDKit, please
contact me at greg.land...@t5informatics.com to talk about that.

Best,
-greg






On Fri, May 12, 2017 at 10:37 AM, Stephen Pickett  wrote:

> Hi
>
>
>
> I have come across a difference in behaviour with the BRICS algorithms
> depending on how the molecule is fragmented when using non-canonical smiles
> input.
>
> RDKIT 2015_03, Python 2.7.10
>
>
>
> BRICSDecompose gives back the starting chirality
>
>
>
> >>> smi='C1CCOC[C@H]1NC'
>
> >>> mol=Chem.MolFromSmiles(smi)
>
> >>> cansmi=Chem.MolToSmiles(mol,1)
>
> >>> cansmi
>
> 'CN[C@H]1CCCOC1'
>
> >>> frags=BRICS.BRICSDecompose(mol,returnMols=True)
>
> >>> bm=list(BRICS.BRICSBuild(frags))
>
> >>> [Chem.MolToSmiles(m,1) for m in bm]
>
> ['CN[C@H]1CCCOC1']
>
>
>
> BreakBRICSBonds inverts the centre.
>
> >>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>
> >>> bm=list(BRICS.BRICSBuild(frags))
>
> >>> [Chem.MolToSmiles(m,1) for m in bm]
>
> ['CN[C@@H]1CCCOC1']
>
>
>
> Starting from the canonical smiles works fine
>
> >>> smi='CN[C@H]1CCCOC1'
>
> >>> mol=Chem.MolFromSmiles(smi)
>
> >>> frags=Chem.GetMolFrags(BRICS.BreakBRICSBonds(mol),asMols=True)
>
> >>> bm=list(BRICS.BRICSBuild(frags))
>
> >>> [Chem.MolToSmiles(m,1) for m in bm]
>
> ['CN[C@H]1CCCOC1']
>
>
>
> The inversion happens in BreakBRICSBonds
>
> >>> smi='C1CCOC[C@H]1NC'
>
> >>> mol=Chem.MolFromSmiles(smi)
>
> >>> Chem.MolToSmiles(BRICS.BreakBRICSBonds(mol),1)
>
> '[15*][C@@H]1CCCOC1.[5*]NC'
>
>
>
> Using the pre canonicalised SMILES is clearly the way to go, but thought
> that this might be indicative of an issue somewhere.
>
>
>
> Regards
>
>
>
> Stephen
>
> --
>
> This e-mail was sent by GlaxoSmithKline Services Unlimited
> (registered in England and Wales No. 1047315), which is a
> member of the GlaxoSmithKline group of companies. The
> registered address of GlaxoSmithKline Services Unlimited
> is 980 Great West Road, Brentford, Middlesex TW8 9GS.
>
> *GSK monitors email communications sent to and from GSK in order to
> protect GSK, our employees, customers, suppliers and business partners,
> from cyber threats and loss of GSK Information. GSK