fmcquillan99 edited a comment on pull request #518:
URL: https://github.com/apache/madlib/pull/518#issuecomment-699189947


   (6)
   fmin definition
   https://github.com/hyperopt/hyperopt/wiki/FMin
   fmin(loss, space, algo, max_evals)
   
   Looks like this PR is setting max_evals =  num_models/num_segments in 
`get_configs_list()`.  For one thing I'm not sure that 
   ```
   self.num_workers = get_seg_number() * get_segments_per_host()
   ```
   gives total number of workers?  On a 1 host, 2 segments-per-host database 
this returned 4 instead of the expected 2.  Also this needs to be consistent 
with the distribution rules set in the mini-batch preprocessor.
   
   (7)
   The function `get_configs_list()` may not be distributing the workload 
correctly to the segments.  
   For 
   ```
       num_models = 3
       num_workers = 3
   ```
   it returns `[(1, 3)]` which seems OK.  But for 
   ```
       num_models = 5
       num_workers = 3
   ```
   it returns `[(1, 5)]` which does not seem OK.  It will not do any hyperopt 
updates if all 5 configs are grouped together.  I would have expected `[(1, 3), 
(4, 5)]`.   For 
   ```
       num_models = 20
       num_workers = 3
   ```
   it returns `[(1, 4), (5, 8), (9, 11), (12, 14), (15, 17), (18, 20)]` which 
means it is running 4 configs on 3 segments multiple times which does not seem 
efficient.  I would have expected something like `[(1, 3), (4, 6), (7, 9), (10, 
12), (13, 15), (16, 18),(19, 20)]` which runs 3 configs at a time on the 3 
segments.
   
   (8)
   In `find_hyperopt_config()` confirm that the loss for *each* model that 
passed to hyperopt after training, and not just the best one from the group.
   
   (9)
   defaults
   Seems like hyperband defaults are being used for hyperopt in the case that 
use does not specify hyperband is not specified.  That will probably throw an 
error


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to