Re: [Rdkit-discuss] Understanding Coloring in Similarity Maps

2019-08-22 Thread Axel Pahl

Hi Sereina,

thanks a ton for the elaborate and very helpful explanation.

Kind regards,
Axel

On 22.08.19 12:23, Sereina wrote:

Hi Axel,

What is calculated in the function GetAtomicWeightsForModel() is the
difference between the probability value of the complete molecule
(“base probability”) and the probability value when the bits of a
certain atom are deleted.

In the cookbook (and based on a quick glance also in your code), the
probability of the active class is used as the measure for the
similarity maps (that’s defined in the getProba() helper function).
This means that any atom whose missing bits lead to an increase in the
probability to be active is colored green. If it leads to a decrease,
it gets colored pink.

Now if you have an inactive molecule then your base probability for
the active class is close to zero. In your cases it looks like nearly
all of the atoms in the molecule are necessary to make these molecules
be considered inactive. In other words, deleting any of green colored
atoms results in a higher probability to be active – although it might
still be below 50% (note that the color range is not standardized
globally but based on the largest difference observed in the molecule).

I hope this helps.

Best,
Sereina



On 22 Aug 2019, at 11:38, Axel Pahl mailto:axelp...@gmx.de>> wrote:

Dear fellow RDKitters,

I am experimenting with the classification example from the Cookbook
[1] using a RandomForestClassifier and Similarity Maps for visualization.
I need, however, some help with the interpretation of the coloring in
the similarity map.
In the attached example, the compounds were correctly predicted
("AC_Pred") as being inactive ("0") with a high probability.
But the corresponding similarity maps show mainly green areas,
indicating (in my understanding) a positive contribution to the
activity class, which should have lead to a different prediction.

What would be the correct interpretation of the coloring?
Many thanks in advance for any help.

Kind regards,
Axel

P.S.: The code is available in a repo [2], an example notebook can be
found in the tutorials folder.

[1] http://www.rdkit.org/docs/Cookbook.html#using-scikit-learn-with-rdkit
[2] https://github.com/apahl/mol_frame

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Understanding Coloring in Similarity Maps

2019-08-22 Thread Sereina
Hi Axel,

What is calculated in the function GetAtomicWeightsForModel() is the difference 
between the probability value of the complete molecule (“base probability”) and 
the probability value when the bits of a certain atom are deleted. 

In the cookbook (and based on a quick glance also in your code), the 
probability of the active class is used as the measure for the similarity maps 
(that’s defined in the getProba() helper function). This means that any atom 
whose missing bits lead to an increase in the probability to be active is 
colored green. If it leads to a decrease, it gets colored pink. 

Now if you have an inactive molecule then your base probability for the active 
class is close to zero. In your cases it looks like nearly all of the atoms in 
the molecule are necessary to make these molecules be considered inactive. In 
other words, deleting any of green colored atoms results in a higher 
probability to be active – although it might still be below 50% (note that the 
color range is not standardized globally but based on the largest difference 
observed in the molecule).

I hope this helps.

Best,
Sereina 


> On 22 Aug 2019, at 11:38, Axel Pahl  wrote:
> 
> Dear fellow RDKitters,
> 
> I am experimenting with the classification example from the Cookbook [1] 
> using a RandomForestClassifier and Similarity Maps for visualization.
> I need, however, some help with the interpretation of the coloring in the 
> similarity map.
> In the attached example, the compounds were correctly predicted ("AC_Pred") 
> as being inactive ("0") with a high probability.
> But the corresponding similarity maps show mainly green areas, indicating (in 
> my understanding) a positive contribution to the activity class, which should 
> have lead to a different prediction.
> 
> What would be the correct interpretation of the coloring?
> Many thanks in advance for any help.
> 
> Kind regards,
> Axel
> 
> P.S.: The code is available in a repo [2], an example notebook can be found 
> in the tutorials folder.
> 
> [1] http://www.rdkit.org/docs/Cookbook.html#using-scikit-learn-with-rdkit 
> 
> [2] https://github.com/apahl/mol_frame 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss