eneriwrt created SPARK-28222:
--------------------------------

             Summary: Feature importance outputs different values in GBT and 
Random Forest in 2.3.3 and 2.4 pyspark version
                 Key: SPARK-28222
                 URL: https://issues.apache.org/jira/browse/SPARK-28222
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.4.3, 2.4.2, 2.4.1, 2.4.0
            Reporter: eneriwrt


Feature importance values obtained in a binary classification project outputs 
different values if 2.3.3 version used or 2.4.0. It happens in Random Forest 
and GBT.

As an example:

*SPARK 2.4*
MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 
0.06894132653061226, 0.15857667209786705, 0.2974447311021076, 
0.06324418636918638]
MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
0.06578883597468652, 0.17433924485055197, 0.31754597164210124, 
0.055888697733790925]
MODEL GradientBoostingClassifier [0.0, 0.7555555555555556, 0.24444444444444438, 
0.0, 1.4602196686471875e-17, 0.0]


*SPARK 2.3.3*
MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 
0.06894132653061226, 0.16413222765342259, 0.2974447311021076, 
0.05991085303585305]
MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
0.06578883597468652, 0.18789704501922055, 0.30398817147343266, 
0.055888697733790925]
MODEL GradientBoostingClassifier [0.0, 0.7555555555555555, 0.24444444444444438, 
0.0, 2.4326753518951276e-17, 0.0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to