[
https://issues.apache.org/jira/browse/SPARK-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Seth Hendrickson updated SPARK-14610:
-------------------------------------
Description:
Currently, the method findSplitsForContinuousFeature in random forest produces
an unnecessary split. For example, if a continuous feature has unique values:
(1, 2, 3), then the possible splits generated by this method are:
* {1|2,3}
* {1,2|3}
* {1,2,3|}
The following unit test is quite clearly incorrect:
{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
val splits = RandomForest.findSplitsForContinuousFeature(featureSamples,
fakeMetadata, 0)
assert(splits.length === 3)
{code}
was:
Currently, the method findSplitsForContinuousFeature in random forest produces
an unnecessary split. For example, if a continuous feature has unique values:
{1, 2, 3}, then the possible splits generated by this method are:
{1|2,3}, {1,2|3} and {1,2,3|}. The following unit test is quite clearly
incorrect:
{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
val splits = RandomForest.findSplitsForContinuousFeature(featureSamples,
fakeMetadata, 0)
assert(splits.length === 3)
{code}
> Remove superfluous split from random forest findSplitsForContinousFeature
> -------------------------------------------------------------------------
>
> Key: SPARK-14610
> URL: https://issues.apache.org/jira/browse/SPARK-14610
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: Seth Hendrickson
>
> Currently, the method findSplitsForContinuousFeature in random forest
> produces an unnecessary split. For example, if a continuous feature has
> unique values: (1, 2, 3), then the possible splits generated by this method
> are:
> * {1|2,3}
> * {1,2|3}
> * {1,2,3|}
> The following unit test is quite clearly incorrect:
> {code:title=rf.scala|borderStyle=solid}
> val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
> val splits =
> RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
> assert(splits.length === 3)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]