fmcquillan99 edited a comment on pull request #518:
URL: https://github.com/apache/madlib/pull/518#issuecomment-699189947


   (6)
   fmin definition
   https://github.com/hyperopt/hyperopt/wiki/FMin
   fmin(loss, space, algo, max_evals)
   
   (a)
   Looks like this PR is setting max_evals =  num_models/num_segments.  For one 
thing I'm not sure that 
   ```
   self.num_workers = get_seg_number() * get_segments_per_host()
   ```
   gives total number of workers?  On a 1 host, 2 segments-per-host database 
this returned 4 instead of the expected 2.  Also this needs to be consistent 
with the distribution rules set in the mini-batch preprocessor.
   
   (b)
   Looks like the logic is for each eval to find the losses for multiple models 
(= num segments) in parallel, then return the best loss value to hyperopt, then 
get a new set of params for multiple models from hyperopt.  Or does it pass the 
loss for *each* model that is trained to hyperopt?  If it is the former, then 
it seems like that might confuse affect hyperband in unknown ways if we are 
chunking/grouping like in this way.
   
   (7)
   We should re-look at the madlib hyperopt params to see if we have defined 
them in the right way.  Do we need an option for an explicit `max_evals` ?
   
   (8)
   defaults
   Seems like hyperband defaults are being used for hyperopt in the case that 
use does not specify hyperband is not specified.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to