junrushao1994 commented on pull request #9859:
URL: https://github.com/apache/tvm/pull/9859#issuecomment-1006999426


   @comaniac Thanks for the extremely valuable feedback!
   
   > when training data gets bigger and bigger, the time to train the XGBoost 
cost model becomes tedious even the accuracy isn't further improved
   
   That's exactly what I'm observing too! In this particular case, 
hyper-parameters of XGB might not be suitable any more, which limits the model 
capacity, and we might have to tweak around to find out the best 
hyperparameters.
   
   > What Ansor has done is simply reduce the re-training frequency (e.g., 
re-train per 2 rounds) when training data size is larger than a threshold.
   
   This is how Ansor deals with this right now...We might consider better 
heuristics in the future, including switching models, tweaking model capacity 
with AutoML stuff, etc.
   
   > we can also refer to the accuracy between the predicted cost and new 
measured latencies to determine whether to re-train the model in the next round
   
   Using our current interface, this is pretty simple to do so. We have a 
`validate` method that allows us to validate the rmse of the cost model's 
prediction - and I used this method quite frequently in model debugging too.
   
   
   Anyway, I think we are pretty aligned with the methodology and path to 
improvement. Let's work together to improve it in the future


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to