Hi Sebastian,

The reason this had me confused is that information about atom mapping is
not taken into account when doing substructure search matching. This is
true both of molecules and reactions.
I also tried to reproduce your examples in Python and failed completely.

Your question about threshold_unmapped_reactant_atoms is what gave me the
clue.

Here's what's going on:
By default the cartridge code does an extra step after reading a reaction
from SMILES/SMARTS: it looks at all the reactants and moves any that don't
have a sufficient fraction of mapped atoms to the agents. We do this by
default because the reactions that we found "in the wild" often have
agents, solvents, etc. mixed in with the reactants. The key parameter used
there is threshold_unmapped_reactant_atoms, which defaults to 0.2.

Here you can see the difference for your example:

chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=false;
SET
chembl_23=> SELECT
reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
@>
reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
 ?column?
----------
 t
(1 row)


chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=true;
SET
chembl_23=> SELECT
reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
@>
reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
 ?column?
----------
 f
(1 row)



You can see in detail what's happening here:

chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=true;
SET
chembl_23=> select
reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
                reaction_from_smarts
----------------------------------------------------
 >ClC1=NC=C[S:1]1>O=S1(=O)N=C(C2=CN=C(Cl)[S:1]2)CO1
(1 row)


chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=false;
SET
chembl_23=> select
reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
                reaction_from_smarts
----------------------------------------------------
 ClC1=NC=C[S:1]1>>O=S1(=O)N=C(C2=CN=C(Cl)[S:1]2)CO1
(1 row)



I hope this helps,
-greg



On Tue, Apr 24, 2018 at 9:34 AM Sebastian Wandernoth <s_wandern...@gmx.de>
wrote:

> Hey guys,
>
> I'm still working on my search engine for reactions and I'm a bit puzzled
> as to what RDKit does with atom mapping information.
> I'm still working with the PostgreSQL cartridge version 0.73.0, which
> should correspond to the release 2017.9.3.
>
> I'm starting off with this example reaction which is fully mapped
> ([S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1)
>
> If I'm using a completely unmapped reaction as query I expect to find this
> one. So the following should return TRUE:
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @> reaction_from_smarts('S1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2SC(Cl)=NC=2)=N1');
>
> ... which it does
>
>
> Next step is to map one atom correctly in the query and try again. I still
> expect this to return TRUE:
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @>
> reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
>
> ... which it doesn't
>
>
> With two atoms mapped correctly in the query, I wouldn't expect to get
> different results from the previous try:
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @>
> reaction_from_smarts('[S:1]1C(Cl)=NC=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1]C(Cl)=NC=2)=N1');
>
> ... this one however returns TRUE again
>
>
> Final try I did was to include a wrong mapping in the query. I definitely
> would expect to get back FALSE here (I'm mapping one sulfur atom to a
> carbon atom and a nitrogen to an oxygen):
>
> SELECT
> reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1')
> @>
> reaction_from_smarts('[S:1]1C(Cl)=[N:2]C=C1>>S(=[O:2])(=O)1O[CH2:1]C(C2SC(Cl)=NC=2)=N1');
>
> ... however this returns TRUE yet again.
>
> Playing around with it a bit more I found that whatever single atom I map
> in the query, I always get back FALSE and if I map more than one atom, I
> always get back TRUE...
> Does this have something to do with the parameter 'rdkit.
> threshold_unmapped_reactant_atoms'? My suspicion is that RDKit only
> counts how many atoms are mapped and not compare them to the correct
> mapping. Can you confirm this?
> Is there any way at all to include atom mapping in the query to filter the
> reactions the way I want to?
>
>
> I hope you guys can help me here. Sorry for the lengthy question, but I
> wanted to include as much information as possible for you to pinpoint the
> issue.
>
> Best regards
> Sebastian
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to