Joseph K. Bradley created SPARK-2756:
----------------------------------------

             Summary: Decision Tree bugs
                 Key: SPARK-2756
                 URL: https://issues.apache.org/jira/browse/SPARK-2756
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 1.0.0
            Reporter: Joseph K. Bradley


2 bugs:

Bug 1: Indexing is inconsistent for aggregate calculations for unordered 
features (in multiclass classification with categorical features, where the 
features had few enough values such that they could be considered unordered, 
i.e., isSpaceSufficientForAllCategoricalSplits=true).

* updateBinForUnorderedFeature indexed agg as (node, feature, featureValue, 
binIndex), where
** featureValue was from arr (so it was a feature value)
** binIndex was in [0,…, 2^(maxFeatureValue-1)-1)
* The rest of the code indexed agg as (node, feature, binIndex, label).

Bug 2: calculateGainForSplit (for classification):
* It returns dummy prediction values when either the right or left children had 
0 weight.  These are incorrect for multiclass classification.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to