Joseph K. Bradley created SPARK-3934:
----------------------------------------
Summary: RandomForest bug in sanity check in DTStatsAggregator
Key: SPARK-3934
URL: https://issues.apache.org/jira/browse/SPARK-3934
Project: Spark
Issue Type: Bug
Components: MLlib
Reporter: Joseph K. Bradley
When run with a mix of unordered categorical and continuous features, on
multiclass classification, RandomForest fails. The bug is in the sanity checks
in getFeatureOffset and getLeftRightFeatureOffsets, which use the wrong indices
for checking whether features are unordered.
Proposal: Remove the sanity checks since they are not really needed, and since
they would require DTStatsAggregator to keep track of an extra set of indices
(for the feature subset).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]