[
https://issues.apache.org/jira/browse/SPARK-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284119#comment-15284119
]
Mahmoud Rawas commented on SPARK-7132:
--------------------------------------
Probably, it will be more beneficial for the user to specify splits as they may
decide to cache or persist splits for better performance.
> Add fit with validation set to spark.ml GBT
> -------------------------------------------
>
> Key: SPARK-7132
> URL: https://issues.apache.org/jira/browse/SPARK-7132
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: Joseph K. Bradley
> Priority: Minor
>
> In spark.mllib GradientBoostedTrees, we have a method runWithValidation which
> takes a validation set. We should add that to the spark.ml API.
> This will require a bit of thinking about how the Pipelines API should handle
> a validation set (since Transformers and Estimators only take 1 input
> DataFrame). The current plan is to include an extra column in the input
> DataFrame which indicates whether the row is for training, validation, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]