[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-17055:
----------------------------
Affects Version/s: (was: 2.0.0)
> add labelKFold to CrossValidator
> --------------------------------
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Vincent
> Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the
> samples in k groups of samples. But in cases when data is gathered from
> different subjects and we want to avoid over-fitting, we want to hold out
> samples with certain labels from training data and put them into validation
> fold, i.e. we want to ensure that the same label is not in both testing and
> training sets.
> Mainstream package like Sklearn already supports such cross validation
> method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]