Vincent created SPARK-17055:
-------------------------------
Summary: add labelKFold to CrossValidator
Key: SPARK-17055
URL: https://issues.apache.org/jira/browse/SPARK-17055
Project: Spark
Issue Type: New Feature
Components: MLlib
Affects Versions: 2.0.0
Reporter: Vincent
Priority: Minor
Current CrossValidator only supports k-fold, which randomly divides all the
samples in k groups of samples. But in cases when data is gathered from
different subjects and we want to avoid over-fitting, we want to hold out
samples with certain labels from training data and put them into validation
fold, i.e. we want to ensure that the same label is not in both testing and
training sets.
Mainstream package like Sklearn already supports such cross validation method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]