Re: [Rdkit-discuss] atom mapping in reaction searches

2018-05-13 Thread Greg Landrum
Hi Sebastian,

I'm a bit mystified by this and am going to have to dig around a bit to see
if I can figure out what's going on.

-greg


On Wed, May 9, 2018 at 9:59 PM Sebastian Wandernoth 
wrote:

> Hey guys,
>
> any chance to get an answer on my issue? Even if the answer is that this
> feature is currently not included in RDKit, it would still be helpful :-)
>
> Best regards
> Sebastian
>
> *Gesendet:* Dienstag, 24. April 2018 um 09:33 Uhr
> *Von:* "Sebastian Wandernoth" 
> *An:* rdkit-discuss@lists.sourceforge.net
> *Betreff:* [Rdkit-discuss] atom mapping in reaction searches
> Hey guys,
>
> I'm still working on my search engine for reactions and I'm a bit puzzled
> as to what RDKit does with atom mapping information.
> I'm still working with the PostgreSQL cartridge version 0.73.0, which
> should correspond to the release 2017.9.3.
>
> I'm starting off with this example reaction which is fully mapped
> ([S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1)
>
> If I'm using a completely unmapped reaction as query I expect to find this
> one. So the following should return TRUE:
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @> reaction_from_smarts('S1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2SC(Cl)=NC=2)=N1');
>
> ... which it does
>
>
> Next step is to map one atom correctly in the query and try again. I still
> expect this to return TRUE:
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @>
> reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
>
> ... which it doesn't
>
>
> With two atoms mapped correctly in the query, I wouldn't expect to get
> different results from the previous try:
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @>
> reaction_from_smarts('[S:1]1C(Cl)=NC=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1]C(Cl)=NC=2)=N1');
>
> ... this one however returns TRUE again
>
>
> Final try I did was to include a wrong mapping in the query. I definitely
> would expect to get back FALSE here (I'm mapping one sulfur atom to a
> carbon atom and a nitrogen to an oxygen):
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @>
> reaction_from_smarts('[S:1]1C(Cl)=[N:2]C=C1>>S(=[O:2])(=O)1O[CH2:1]C(C2SC(Cl)=NC=2)=N1');
>
> ... however this returns TRUE yet again.
>
> Playing around with it a bit more I found that whatever single atom I map
> in the query, I always get back FALSE and if I map more than one atom, I
> always get back TRUE...
> Does this have something to do with the parameter 'rdkit.
> threshold_unmapped_reactant_atoms'? My suspicion is that RDKit only
> counts how many atoms are mapped and not compare them to the correct
> mapping. Can you confirm this?
> Is there any way at all to include atom mapping in the query to filter the
> reactions the way I want to?
>
>
> I hope you guys can help me here. Sorry for the lengthy question, but I
> wanted to include as much information as possible for you to pinpoint the
> issue.
>
> Best regards
> Sebastian
> --
> Check out the vibrant tech community on one of the world's most engaging
> tech sites, Slashdot.org!
> http://sdm.link/slashdot___
> Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] question on rdRGroupDecomposition

2018-05-13 Thread Patrick Walters
Hi All,

I'm hoping someone can help me with rdRGroupDecomposition.  I'd like to be
able to specify specific R-group locations AND match cases where R=H.   The
example below illustrates what I'm talking about.
When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R
== H are skipped.  I tried putting an explicit hydrogen on the core to
block a position, but it appears that the explicit hydrogen is ignored.

from rdkit import Chem
from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition,
RGroupDecompositionParameters

# run an RGroupDecomposition on a set of molecules
def process_r_groups(core_mol,rg_params,mols):
rg = RGroupDecomposition(core_mol,rg_params)
for mol in mol_list:
rg.Add(mol)
rg.Process()
return [x for x in rg.GetRGroupsAsRows(asSmiles=True)]


buff = """CCc1ccnc(C)n1
Cc1ncccn1
Cc1cnc(C)nc1"""

mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")]
core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1")
# default parameters, note that 3 R-groups are returned, the
# explicit hydrogen is ignored
params_1 = RGroupDecompositionParameters()
for row in process_r_groups(core,params_1,mol_list):
print(row)

print()

params_2 = RGroupDecompositionParameters()
params_2.onlyMatchAtRGroups = True
# run with the onlyMatchAtRGroups parameter
# now only one row is returned
for row in process_r_groups(core,params_2,mol_list):
print(row)

The output from the script above is

{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'}
{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H][*:2]', 'R3': '[H][*:3]'}
{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'}

{'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H]C([H])([H])C([H])([H])[*:2]'}

I'd like to figure out how I can only get the substituents at the labeled
positions, but have it match where R1 == H or R2 == H.

Thanks in advance,

Pat
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chembience

2018-05-13 Thread Markus Sitzmann
Hello,

I have released Chembience 0.2.0: it includes an update to RDKit 2018.03
and also provides Jupyter as new base App container type.

https://github.com/chembience/chembience

(so, assuming you have Docker and docker-compose installed on your
computer, you are a few, easy commands away from your personal Jupyter
notebook server with all RDKit 2018.03 goodness readily available).

Best,
Markus


On Tue, Apr 24, 2018 at 10:44 AM Markus Sitzmann 
wrote:

> Hello,
>
> since it includes RDKit as one of its major components I am happy to
> announce the first release of my new open-source project Chembience:
>
> A Docker-based, cloudable platform for the development of
> chemoinformatics-centric web applications and microservices.
>
> https://github.com/chembience/chembience
>
> (unfortunately it is still on RDKit 2017.09_3, I failed releasing it
> before 2018.03 :-) ).
>
> Best,
> Markus
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss