dkoepke opened a new pull request #12255: URL: https://github.com/apache/druid/pull/12255
### Description This fixes intermittent, spurious failures that we've observed in the Kinesis sharding integration tests due to Kinesis taking longer than the code expected to start a sharding operation. The method that's changed is part of the integration test suite and only used by the test cases that we've seen are flaky. #### Increase retries Prior to this change, the tests expected a sharding operation to start in 9 seconds (30 retries * 300ms delay/retry). This change bumps the number of retries to 100, giving Kinesis 30 seconds to start the sharding. We chose this value to ensure the existing tests always pass for us while still allowing them to fail reasonably fast if there's some integration logic error. This Amazon doesn't provide guidance on when an operation might start but does document that a ["scaling action could take a few minutes to complete".](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_UpdateShardCount.html) #### Clarify the condition This PR also makes a small, clarifying change to the condition used to determine if sharding has started. Instead of checking if the number of shards has increased (which was technically correct even if the test is reducing the number of shards due to a Kinesis implementation detail), we now just check if the shard count has changed. <hr> ##### Key changed/added classes in this PR * `org.apache.druid.testing.utils.KinesisAdminClient` <hr> This PR has: - [x] been self-reviewed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
