[
https://issues.apache.org/jira/browse/SPARK-28222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean R. Owen resolved SPARK-28222.
----------------------------------
Resolution: Duplicate
> Feature importance outputs different values in GBT and Random Forest in 2.3.3
> and 2.4 pyspark version
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-28222
> URL: https://issues.apache.org/jira/browse/SPARK-28222
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3
> Reporter: eneriwrt
> Priority: Minor
>
> Feature importance values obtained in a binary classification project outputs
> different values if 2.3.3 version used or 2.4.0. It happens in Random Forest
> and GBT. Turns out that values that are equal than sklearn output are from
> 2.3.3 version.
> As an example:
> *SPARK 2.4*
> MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269,
> 0.06894132653061226, 0.15857667209786705, 0.2974447311021076,
> 0.06324418636918638]
> MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694,
> 0.06578883597468652, 0.17433924485055197, 0.31754597164210124,
> 0.055888697733790925]
> MODEL GradientBoostingClassifier [0.0, 0.7555555555555556,
> 0.24444444444444438, 0.0, 1.4602196686471875e-17, 0.0]
> *SPARK 2.3.3*
> MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455,
> 0.06894132653061226, 0.16413222765342259, 0.2974447311021076,
> 0.05991085303585305]
> MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694,
> 0.06578883597468652, 0.18789704501922055, 0.30398817147343266,
> 0.055888697733790925]
> MODEL GradientBoostingClassifier [0.0, 0.7555555555555555,
> 0.24444444444444438, 0.0, 2.4326753518951276e-17, 0.0]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]