Seth Hendrickson created SPARK-14610:
----------------------------------------
Summary: Remove superfluous split from random forest
findSplitsForContinousFeature
Key: SPARK-14610
URL: https://issues.apache.org/jira/browse/SPARK-14610
Project: Spark
Issue Type: Improvement
Components: ML
Reporter: Seth Hendrickson
Currently, the method findSplitsForContinuousFeature in random forest produces
an unnecessary split. For example, if a continuous feature has unique values:
{1, 2, 3}, then the possible splits generated by this method are:
{1|2,3}, {1,2|3} and {1,2,3|}. The following unit test is quite clearly
incorrect:
{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
val splits = RandomForest.findSplitsForContinuousFeature(featureSamples,
fakeMetadata, 0)
assert(splits.length === 3)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]