maytasm opened a new pull request #10517: URL: https://github.com/apache/druid/pull/10517
Fix compaction integration test CI timeout ### Description Compaction integration test intermittently timeout (Travis job stuck until timeout of 50 minutes and/or Travis job terminates after 10 mins of not receiving new output) due to one of the test submitting/running too many tasks at the same time. Specifically, the test submits range partitioning compaction tasks with 2 subtasks each (for a total of 6 tasks). This causes the cluster to intermittently fails. I am not sure what is the _real_ maximum number of tasks the Druid cluster running in Travis can handle. It probably also depends on the type of tasks too. Anyhow, changed the task to only have 1 subtask each (for a total of 4 tasks) and the intermittent failure is now fixed. Please see below for Travis build with 72 compaction integration test. - Without any change from this PR. We can see that about 25% failed due to the above issue. https://travis-ci.org/github/maytasm/druid/builds/735867518 - Removing the test that submits range partitioning compaction tasks with 2 subtasks each (for a total of 6 tasks). We can see that 100% passed. https://travis-ci.org/github/maytasm/druid/builds/736172442 - Changing the range partitioning compaction tasks to 1 subtask each. We can see that 100% passed. https://travis-ci.org/github/maytasm/druid/builds/736451204 This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [x] been tested in a test Druid cluster. - Fix compaction integration test CI timeout - Update Integration test README on root cause of the test failure. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
