[jira] [Comment Edited] (SPARK-17055) add labelKFold to CrossValidator

Vincent (JIRA) Mon, 22 Aug 2016 02:14:48 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430381#comment-15430381
 ]


Vincent edited comment on SPARK-17055 at 8/22/16 9:14 AM:
----------------------------------------------------------

well, a better model will have a better cv performance on validation data with 
unseen labels, so the final selected model will have a relatively better 
capability on predicting samples with unseen categories/labels in real case.


was (Author: vincexie):
well, a better model will have a better cv performance on data with unseen 
labels, so the final selected model will have a relatively better capability on 
predicting samples with unseen categories/labels in real case.

> add labelKFold to CrossValidator
> --------------------------------
>
>                 Key: SPARK-17055
>                 URL: https://issues.apache.org/jira/browse/SPARK-17055
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Vincent
>            Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the 
> samples in k groups of samples. But in cases when data is gathered from 
> different subjects and we want to avoid over-fitting, we want to hold out 
> samples with certain labels from training data and put them into validation 
> fold, i.e. we want to ensure that the same label is not in both testing and 
> training sets.
> Mainstream packages like Sklearn already supports such cross validation 
> method. 
> (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-17055) add labelKFold to CrossValidator

Reply via email to