junrushao1994 commented on pull request #9859: URL: https://github.com/apache/tvm/pull/9859#issuecomment-1006999426
@comaniac Thanks for the extremely valuable feedback! > when training data gets bigger and bigger, the time to train the XGBoost cost model becomes tedious even the accuracy isn't further improved That's exactly what I'm observing too! In this particular case, hyper-parameters of XGB might not be suitable any more, which limits the model capacity, and we might have to tweak around to find out the best hyperparameters. > What Ansor has done is simply reduce the re-training frequency (e.g., re-train per 2 rounds) when training data size is larger than a threshold. This is how Ansor deals with this right now...We might consider better heuristics in the future, including switching models, tweaking model capacity with AutoML stuff, etc. > we can also refer to the accuracy between the predicted cost and new measured latencies to determine whether to re-train the model in the next round Using our current interface, this is pretty simple to do so. We have a `validate` method that allows us to validate the rmse of the cost model's prediction - and I used this method quite frequently in model debugging too. Anyway, I think we are pretty aligned with the methodology and path to improvement. Let's work together to improve it in the future -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
