[GitHub] [madlib] Advitya17 opened a new pull request #513: DL: [AutoML] Add support for 'diagonal' Hyperband optimized for MPP

GitBox Fri, 21 Aug 2020 18:19:59 -0700


Advitya17 opened a new pull request #513:
URL: https://github.com/apache/madlib/pull/513



   JIRA: MADLIB-{1447,1448,1449}
   
   We integrate AutoML capabilities in Apache MADlib by introducing a function 
called `madlib_keras_automl`, which bridges the worlds of setting and running 
model selection together, and helps automate and accelerate the model selection 
and training processes end-to-end. The user can declaratively specify the names 
of their train/val datasets, mst and output tables, model architecture and 
param grid details, the chosen method name and associated params, and various 
training details, and our API handles the scheduling and execution components 
with the algorithm workload info displayed to the user.
   
   The first AutoML algorithm we implement is Hyperband, a state-of-the-art 
hyperparameter optimization algorithm which speeds up random search with 
adaptive resource allocation, successive halving (SHA) and early stopping. This 
algorithm generates a schedule with user inputs and evaluates model 
configurations in a smarter, more efficient way by continually exploring more 
promising configurations. 
   
   In the case of MPP databases such as Greenplum, we further accelerate this 
algorithm by simultaneously evaluating multiple rounds of the algorithm located 
along a 'diagonal', to keep machines busy and take advantage of the large 
distributed storage and compute power offered by Greenplum.
   
   With the diagonal approach, we introduce some additional low-level 
optimizations with the implementation related to optimal runtimes and code 
quality by:
   
   1. Reducing number of random search function calls from `s_max+1` to just 
`1`.
   2. Reducing number of multiple model training function calls from 
`s_max(s_max+1)/2` to `s_max+1`.
   3. Reducing number of sampled SHA configuration groups from `s_max+1` to 
`s_max+1-skip_last` (i.e. only sampling the configurations actually needed for 
evaluation).
   
   Key:
   R --> maximum amount of resources/iterations that can be allocated to a 
single configuration in any particular round of Hyperband
   eta --> factor controlling the proportion of configs discarded in each round 
of SHA
   s_max = floor(log(R)/log(eta)) --> controls the number of SHA brackets 
(=s_max+1) executed with Hyperband
   skip_last --> Number of diagonals to skip at the end (to avoid running the 
most time/resource intensive bracket(s) and/or to avoid overfitting or loss in 
predictive power). skip_last ∈ [0, s_max]
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [madlib] Advitya17 opened a new pull request #513: DL: [AutoML] Add support for 'diagonal' Hyperband optimized for MPP

Reply via email to