[ 
https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247870#comment-16247870
 ] 

yuhao yang commented on SPARK-18755:
------------------------------------

Thanks for all the interests. 

For anyone who wants to contribute on the item, IMO we need to support the 
Random Grid Search function as in sklearn or other popular libraries. The 
initial PR can start with a basic prototype but should contain plans for 
supporting future extension for function parity. Also since we add the 
randomized search primarily to speedup the tuning process, it's best if we can 
present some benchmark on public dataset to demonstrate the effectiveness.

also cc [~srowen] [~mlnick] [~yanboliang] [~holdenk] to see if any one has 
bandwidth to Shepherd this. I can help review. Thanks.


> Add Randomized Grid Search to Spark ML
> --------------------------------------
>
>                 Key: SPARK-18755
>                 URL: https://issues.apache.org/jira/browse/SPARK-18755
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: yuhao yang
>
> Randomized Grid Search  implements a randomized search over parameters, where 
> each setting is sampled from a distribution over possible parameter values. 
> This has two main benefits over an exhaustive search:
> 1. A budget can be chosen independent of the number of parameters and 
> possible values.
> 2. Adding parameters that do not influence the performance does not decrease 
> efficiency.
> Randomized Grid search usually gives similar result as exhaustive search, 
> while the run time for randomized search is drastically lower.
> For more background, please refer to:
> sklearn: http://scikit-learn.org/stable/modules/grid_search.html
> http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for-optimal-tuning-parameters/
> http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
> https://www.r-bloggers.com/hyperparameter-optimization-in-h2o-grid-search-random-search-and-the-future/.
> There're two ways to implement this in Spark as I see:
> 1. Add searchRatio to ParamGridBuilder and conduct sampling directly during 
> build. Only 1 new public function is required.
> 2. Add trait RadomizedSearch and create new class RandomizedCrossValidator 
> and RandomizedTrainValiationSplit, which can be complicated since we need to 
> deal with the models.
> I'd prefer option 1 as it's much simpler and straightforward. We can support 
> Randomized grid search via some smallest change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to