[ 
https://issues.apache.org/jira/browse/SPARK-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15934736#comment-15934736
 ] 

Yanbo Liang edited comment on SPARK-17136 at 3/21/17 3:17 PM:
--------------------------------------------------------------

[~sethah] Thanks for the design doc.
One quick question: In your design, if we set the parameters in optimizer, Do 
we still support setting these parameters in estimator again?
If yes, why we need to support two entrances for the same set of params? I saw 
you reply at the design doc, you propose to make the params in optimizer 
superior to the ones in estimator. Does it involves confusion for users and 
extra maintenance cost?
Does the grid search-based model selection in the current framework (such as 
CrossValidator) can still work well? 
I'm more prefer to keep these params in estimators, make the optimizer layer as 
an internal API, and users can register their own optimizer implementation such 
as the data source support. Since I found this is more aligned with the 
original [ML pipeline 
design|https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit#]
 which stores params outside a pipeline component.
Thanks.


was (Author: yanboliang):
[~sethah] Thanks for the design doc.
One quick question: In your design, if we set the parameters in optimizer, Do 
we still support setting these parameters in estimator again?
If yes, why we need to support two entrances for the same set of params? I saw 
you reply at the design doc, you propose to make the params in optimizer 
superior to the ones in estimator. Does it involves confusion for users and 
extra maintenance cost?
Does the grid search-based model selection in the current framework (such as 
CrossValidator) can still work well? Thanks.
I'm more prefer to keep these params in estimators, make the optimizer layer as 
an internal API, and users can register their own optimizer implementation such 
as the data source support. Since I found this is more aligned with the 
original [ML pipeline 
design|https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit#]
 which stores params outside a pipeline component.


> Design optimizer interface for ML algorithms
> --------------------------------------------
>
>                 Key: SPARK-17136
>                 URL: https://issues.apache.org/jira/browse/SPARK-17136
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Seth Hendrickson
>
> We should consider designing an interface that allows users to use their own 
> optimizers in some of the ML algorithms, similar to MLlib. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to