maytasm opened a new pull request #10696: URL: https://github.com/apache/druid/pull/10696
Fix kinesis integration test ### Description This is a followup to https://github.com/apache/druid/pull/10692 Turns out that I merge https://github.com/apache/druid/pull/10692 in too soon and that the Kinesis integration test can still fail intermittently. This PR should now really fix it. There were two causes for Kinesis intermittent failure: 1) Part of the Kinesis IT verify ingested data after the ingestion task completed (so that segments are loaded onto historical and are no longer "realtime". The test was doing this by terminating the supervisor to force the ingestion task to complete. However, it seems like there might be a bug that causes the running ingestion task to become stuck and continue running even after the supervisor terminated. 2) Some part of Kinesis IT may take some time to successfully verify the result. The current timeout/retry count can be too low. This PR addresses these issues by: 1) Decrease task duration to 30 seconds. The test will then wait until task naturally complete and handoff segments to historical (instead of terminating the supervisor to force task to complete). 2) Increase retry timeout to 20 minutes from 10 minutes. This is done by increasing retry count from 120 to 240. This PR also includes some nice to have: 1) Update API for terminating supervisor to use `/terminate` since `/shutdown` is deprecated. 2) Add shutdown task API call to test teardown to make sure that we also properly clean up all task 3) Update some logging to provide more information in case of test failure This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
