[ 
https://issues.apache.org/jira/browse/SPARK-28222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eneriwrt updated SPARK-28222:
-----------------------------
    Description: 
Feature importance values obtained in a binary classification project outputs 
different values if 2.3.3 version used or 2.4.0. It happens in Random Forest 
and GBT. Turns out that values that are equal than sklearn output are from 
2.3.3 version. 

As an example:

*SPARK 2.4*
 MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 
0.06894132653061226, 0.15857667209786705, 0.2974447311021076, 
0.06324418636918638]
 MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
0.06578883597468652, 0.17433924485055197, 0.31754597164210124, 
0.055888697733790925]
 MODEL GradientBoostingClassifier [0.0, 0.7555555555555556, 
0.24444444444444438, 0.0, 1.4602196686471875e-17, 0.0]

*SPARK 2.3.3*
 MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 
0.06894132653061226, 0.16413222765342259, 0.2974447311021076, 
0.05991085303585305]
 MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
0.06578883597468652, 0.18789704501922055, 0.30398817147343266, 
0.055888697733790925]
 MODEL GradientBoostingClassifier [0.0, 0.7555555555555555, 
0.24444444444444438, 0.0, 2.4326753518951276e-17, 0.0]

  was:
Feature importance values obtained in a binary classification project outputs 
different values if 2.3.3 version used or 2.4.0. It happens in Random Forest 
and GBT. Turns out that values that are equal than sklearn output are 2.3.3 
version. 

As an example:

*SPARK 2.4*
 MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 
0.06894132653061226, 0.15857667209786705, 0.2974447311021076, 
0.06324418636918638]
 MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
0.06578883597468652, 0.17433924485055197, 0.31754597164210124, 
0.055888697733790925]
 MODEL GradientBoostingClassifier [0.0, 0.7555555555555556, 
0.24444444444444438, 0.0, 1.4602196686471875e-17, 0.0]

*SPARK 2.3.3*
 MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 
0.06894132653061226, 0.16413222765342259, 0.2974447311021076, 
0.05991085303585305]
 MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
0.06578883597468652, 0.18789704501922055, 0.30398817147343266, 
0.055888697733790925]
 MODEL GradientBoostingClassifier [0.0, 0.7555555555555555, 
0.24444444444444438, 0.0, 2.4326753518951276e-17, 0.0]


> Feature importance outputs different values in GBT and Random Forest in 2.3.3 
> and 2.4 pyspark version
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28222
>                 URL: https://issues.apache.org/jira/browse/SPARK-28222
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3
>            Reporter: eneriwrt
>            Priority: Minor
>
> Feature importance values obtained in a binary classification project outputs 
> different values if 2.3.3 version used or 2.4.0. It happens in Random Forest 
> and GBT. Turns out that values that are equal than sklearn output are from 
> 2.3.3 version. 
> As an example:
> *SPARK 2.4*
>  MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 
> 0.06894132653061226, 0.15857667209786705, 0.2974447311021076, 
> 0.06324418636918638]
>  MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
> 0.06578883597468652, 0.17433924485055197, 0.31754597164210124, 
> 0.055888697733790925]
>  MODEL GradientBoostingClassifier [0.0, 0.7555555555555556, 
> 0.24444444444444438, 0.0, 1.4602196686471875e-17, 0.0]
> *SPARK 2.3.3*
>  MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 
> 0.06894132653061226, 0.16413222765342259, 0.2974447311021076, 
> 0.05991085303585305]
>  MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
> 0.06578883597468652, 0.18789704501922055, 0.30398817147343266, 
> 0.055888697733790925]
>  MODEL GradientBoostingClassifier [0.0, 0.7555555555555555, 
> 0.24444444444444438, 0.0, 2.4326753518951276e-17, 0.0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to