rohangarg commented on PR #12818: URL: https://github.com/apache/druid/pull/12818#issuecomment-1198137176
@paul-rogers : Thanks a lot for review and the feedback! > We spin up the Docker cluster and then start tests. The current ITs don't check the status of the cluster before they run: they simply rely on the retry utility to handle any failures. I've noticed that, on a single machine, it can take significant time for the cluster to come up. The "new ITs" specifically check the health of each service before starting the tests, and I've seen that this can take a while. So, in the scenario you describe, it may be that the good historical comes up and serves the 50 queries before the bad one has had a chance to get itself organized. Yes, that's what my conjecture was too regarding the flakiness. The current test only has some application level checks which acts as a proxy for test setup valdiation too, which seems a bit shaky. The delay in test setups due to load on CI is probably the reason I can't reproduce it locally. Great to know that the new framework has first class support for test setup check! > So, an alternative fix would have been to check the health of all services before running the tests. Still, that leaves the statistical nature of the failure detection, which is unappealing. The deterministic mechanism you implemented seems much better. Yes, I had thought of that approach but didn't move in that direction unless we have a good reason. Seemed better to me to make the test verify only the query-retry mechanism on the bad historical. > One suggestion: please add your explanation of the new behavior to the test code itself as a comment. I had a hard time figuring out what the code was doing until I read your explanation, which cleared thing right up. Would be great to preserve that explanation for future readers of the code. Yes, totally agree with you. I have added more explanation to the existing code documentation in the test and also updated some of it which has become outdated after the re-write. Thanks for catching that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
