HI Paolo
That's cool thanks. This will also maybe help me in trying to solve my problem 
of R-group label numbering not taking into account the actual R-group numbering 
(ie if a molecule has R8 and R5 as sole R-group definitions then they get R1,R2 
labels).
I also was in contact with Brian Kelley and he suggested to fix it in the 
underlying codebase so I hope this will be fixed in the next version :)
Cheers
Nik


From: Paolo Tosco <paolo.tosco.m...@gmail.com>
Sent: Thursday, December 13, 2018 11:09 AM
To: Stiefl, Nikolaus <nikolaus.sti...@novartis.com>; RDKit Discuss 
<rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] RGroup matching in RGroup decomposition code


Hi Nik,

There is a way to achieve what you describe, even though it is slightly 
cumbersome:

from rdkit import Chem

from rdkit.Chem import rdmolops

from rdkit.Chem.Draw import MolsToGridImage, IPythonConsole

from rdkit.Chem.rdRGroupDecomposition import (

    RGroupDecomposition, RGroupDecompositionParameters)

smis = ['Cc1ccnc(O)c1', 'Cc1cc(Cl)ccn1', 'Nc1ccccn1',

        'Nc1ccc(Br)cn1', 'c1ccncc1']

mols = [Chem.MolFromSmiles(smi) for smi in smis]

MolsToGridImage(mols)
[cid:image001.png@01D49312.8A22D520]

params = RGroupDecompositionParameters()

# rather than using the built-in flag we will manually

# adjust the query in two steps using AdjustQueryProperties()

params.onlyMatchAtRGroups = False

# just atom number the rgroups

core1 = Chem.MolFromSmiles('n1ccc([*:2])cc([*:1])1')

# make dummies queries

core1_params = rdmolops.AdjustQueryParameters()

core1_params.makeDummiesQueries = True

core1_params.adjustDegree = False

core1 = rdmolops.AdjustQueryProperties(core1, core1_params)

# change the atoms connected to the dummies into dummies

former_atomic_nums = {}

for b in core1.GetBonds():

    if (b.GetBeginAtom().GetAtomicNum() == 0):

        a = b.GetEndAtom()

    elif (b.GetEndAtom().GetAtomicNum() == 0):

        a = b.GetBeginAtom()

    else:

        continue

    former_atomic_nums[a.GetIdx()] = a.GetAtomicNum()

    a.SetAtomicNum(0)

# this has the same effect as setting onlyMatchAtRGroups to True

# but we can avoid applying it the atoms connected to the R groups

core1_params.adjustHeavyDegreeFlags = Chem.ADJUST_IGNOREDUMMIES

core1_params.makeDummiesQueries = False

core1_params.adjustDegree = False

core1_params.adjustHeavyDegree = True

core1 = rdmolops.AdjustQueryProperties(core1, core1_params)

# restore the original atomic numbers

for i, an in former_atomic_nums.items():

    core1.GetAtomWithIdx(i).SetAtomicNum(an)

rg1 = RGroupDecomposition(core1, params)

failMols = []

for m in mols:

    res = rg1.Add(m)

    if res < 0:

        failMols.append(m)

rg1.Process()

True

print("FailedMols: %s"%" ".join([Chem.MolToSmiles(m) for m in failMols]))

FailedMols: Nc1ccc(Br)cn1

core1
[cid:image002.png@01D49312.8A22D520]

d = rg1.GetRGroupsAsColumns(asSmiles=False)

MolsToGridImage(d['Core'])
[cid:image003.png@01D49312.8A22D520]

MolsToGridImage(d['R1'])
[cid:image004.png@01D49312.8A22D520]

MolsToGridImage(d['R2'])
[cid:image005.png@01D49312.8A22D520]

Hope that helps, cheers
p.

On 12/11/18 11:01, Stiefl, Nikolaus wrote:
Hi all,

I was playing around with the RGroup decomposition code and must say that I am 
pretty impressed by it. The fact that one can directly work with a MDL R-group 
file and that the output is a pandasDataFrame makes analysis really slick - 
well done !

However, one thing that irritates me is the fact that seemingly when I have 
R-groups defined in my core and enforce matching only at R-groups then 
molecules having hydrogen atoms in that position are ignored in the "Add" step. 
I would expect those to be included as long as the molecules don't have 
additional heavy atoms in positions that are not defined as R-groups in the 
core.

______________ snip ____________________

from rdkit import Chem
from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, 
RGroupDecompositionParameters


smis = ['Cc1ccnc(O)c1', 'Cc1cc(Cl)ccn1', 'Nc1ccccn1', 'Nc1ccc(Br)cn1', 
'c1ccncc1']
mols = [Chem.MolFromSmiles(smi) for smi in smis]
params = RGroupDecompositionParameters()

params.onlyMatchAtRGroups = True

# just atom number the rgroups
core1 = Chem.MolFromSmiles('n1ccc([*:2])cc([*:1])1')
rg1 = RGroupDecomposition(core1, params)

failMols = []
for m in mols:
  res = rg1.Add(m)
  if res < 0:
    failMols.append(m)

rg1.Process()

print("FailedMols: %s"%" ".join([Chem.MolToSmiles(m) for m in failMols]))

____________ end snip ________________


the output shows that molecules 3-5 are not included at the "Add" step

>> FailedMols: Nc1ccccn1 Nc1ccc(Br)cn1 c1ccncc1

For molecules 4 (the 5-bromo substituted aminopyridine) I agree, however I 
don't understand how I can make sure mols 3 and 5 are also included ... is 
there a magic parameter that I can set?

Cheers
Nik







_______________________________________________

Rdkit-discuss mailing list

Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss&d=DwMD-g&c=ZbgFmJjg4pdtrnL2HUJUDw&r=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo&m=9Fo4M7x0iY_q97UeAPGtEFnDZEoGGq-9PrBQRhWHbAY&s=ZzHmg47DY5D0TZNAcvJKp6KD--CII7D0-oVQmeTCwvo&e=>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to