Joseph K. Bradley created SPARK-2756:
----------------------------------------
Summary: Decision Tree bugs
Key: SPARK-2756
URL: https://issues.apache.org/jira/browse/SPARK-2756
Project: Spark
Issue Type: Bug
Components: MLlib
Affects Versions: 1.0.0
Reporter: Joseph K. Bradley
2 bugs:
Bug 1: Indexing is inconsistent for aggregate calculations for unordered
features (in multiclass classification with categorical features, where the
features had few enough values such that they could be considered unordered,
i.e., isSpaceSufficientForAllCategoricalSplits=true).
* updateBinForUnorderedFeature indexed agg as (node, feature, featureValue,
binIndex), where
** featureValue was from arr (so it was a feature value)
** binIndex was in [0,…, 2^(maxFeatureValue-1)-1)
* The rest of the code indexed agg as (node, feature, binIndex, label).
Bug 2: calculateGainForSplit (for classification):
* It returns dummy prediction values when either the right or left children had
0 weight. These are incorrect for multiclass classification.
--
This message was sent by Atlassian JIRA
(v6.2#6252)