maytasm opened a new pull request #10696:
URL: https://github.com/apache/druid/pull/10696


   Fix kinesis integration test
   
   ### Description
   This is a followup to https://github.com/apache/druid/pull/10692
   Turns out that I merge https://github.com/apache/druid/pull/10692 in too 
soon and that the Kinesis integration test can still fail intermittently. This 
PR should now really fix it.
   There were two causes for Kinesis intermittent failure:
   1) Part of the Kinesis IT verify ingested data after the ingestion task 
completed (so that segments are loaded onto historical and are no longer 
"realtime". The test was doing this by terminating the supervisor to force the 
ingestion task to complete. However, it seems like there might be a bug that 
causes the running ingestion task to become stuck and continue running even 
after the supervisor terminated.
   2) Some part of Kinesis IT may take some time to successfully verify the 
result. The current timeout/retry count can be too low. 
   
   This PR addresses these issues by:
   1) Decrease task duration to 30 seconds. The test will then wait until task 
naturally complete and handoff segments to historical (instead of terminating 
the supervisor to force task to complete).
   2) Increase retry timeout to 20 minutes from 10 minutes. This is done by 
increasing retry count from 120 to 240.
   This PR also includes some nice to have:
   1) Update API for terminating supervisor to use `/terminate` since 
`/shutdown` is deprecated.
   2) Add shutdown task API call to test teardown to make sure that we also 
properly clean up all task
   3) Update some logging to provide more information in case of test failure
   
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to