ibelyakov opened a new pull request, #256:
URL: https://github.com/apache/ignite-extensions/pull/256

   The issue happens when one “pure“ node (with impurity<sup>*</sup> = 0) is 
presented in the tree. We calculate an impurity only for children nodes and not 
for the current node, as well as do not check whether the node is “pure“ and 
contains just one label, due to that, the “bestSplit” calculation is executed 
for the already “pure“ node, which decides that all items should be moved to 
the left child node and no items to the right (leaf node), which gives 2 “pure“ 
children nodes. Since we don’t calculate impurity for the current (parent) node 
the `parentNode.getImpurity() - split.get().getImpurity() > minImpurityDelta` 
check is always true, and we continue to split the already “pure“ node until 
the max tree depth is reached.
   The following changes were made to resolve the issue:
   
   1. Gain<sup>**</sup> calculation and check for the split were added.
   2. Node’s impurity check is added, once the impurity becomes 0 it means that 
the node is “pure” and we don’t need to calculate a split for it.
   3. Gini impurity calculation was changed to `(1 - sum(p^2))` to get the 
correct values in the range from 0 to 0.5 as required for the Gini index.
   
   <sup>*</sup> Impurity - is a value from 0 to 0.5, which shows whether the 
node is “pure“ (impurity = 0) having just 1 label or “impure” with 
impurity=0.5, which is the worst scenario where the label ratio is 1:1.
   <sup>**</sup> Gain - is a difference between the parent node’s impurity and 
weighted children nodes' impurity. The split which provides the maximum gain 
value is considered the best. See 
https://www.learndatasci.com/glossary/gini-impurity/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to