[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624114#comment-15624114
]
Vincent commented on SPARK-17055:
---------------------------------
[~srowen] May I ask the reason why we close this issue? It'd be helpful for us
to understand current guideline if we are to implement more features in
ML/MLLIB, thanks.
> add labelKFold to CrossValidator
> --------------------------------
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Vincent
> Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the
> samples in k groups of samples. But in cases when data is gathered from
> different subjects and we want to avoid over-fitting, we want to hold out
> samples with certain labels from training data and put them into validation
> fold, i.e. we want to ensure that the same label is not in both testing and
> training sets.
> Mainstream packages like Sklearn already supports such cross validation
> method.
> (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]