Seth Hendrickson created SPARK-14610:
----------------------------------------

             Summary: Remove superfluous split from random forest 
findSplitsForContinousFeature
                 Key: SPARK-14610
                 URL: https://issues.apache.org/jira/browse/SPARK-14610
             Project: Spark
          Issue Type: Improvement
          Components: ML
            Reporter: Seth Hendrickson


Currently, the method findSplitsForContinuousFeature in random forest produces 
an unnecessary split. For example, if a continuous feature has unique values: 
{1, 2, 3}, then the possible splits generated by this method are:
{1|2,3}, {1,2|3} and {1,2,3|}. The following unit test is quite clearly 
incorrect:

{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
      val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, 
fakeMetadata, 0)
      assert(splits.length === 3)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to