[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624928#comment-15624928
]
Sean Owen commented on SPARK-17055:
-----------------------------------
I did so following the discussion on the PR, that you were a part of yesterday:
https://github.com/apache/spark/pull/14640#issuecomment-257205012
> add labelKFold to CrossValidator
> --------------------------------
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Vincent
> Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the
> samples in k groups of samples. But in cases when data is gathered from
> different subjects and we want to avoid over-fitting, we want to hold out
> samples with certain labels from training data and put them into validation
> fold, i.e. we want to ensure that the same label is not in both testing and
> training sets.
> Mainstream packages like Sklearn already supports such cross validation
> method.
> (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]