maytasm opened a new pull request #10517:
URL: https://github.com/apache/druid/pull/10517


   Fix compaction integration test CI timeout
   
   ### Description
   
   Compaction integration test intermittently timeout (Travis job stuck until 
timeout of 50 minutes and/or Travis job terminates after 10 mins of not 
receiving new output) due to one of the test submitting/running too many tasks 
at the same time. Specifically, the test submits range partitioning compaction 
tasks with 2 subtasks each (for a total of 6 tasks). This causes the cluster to 
intermittently fails. I am not sure what is the _real_ maximum number of tasks 
the Druid cluster running in Travis can handle. It probably also depends on the 
type of tasks too. Anyhow, changed the task to only have 1 subtask each (for a 
total of 4 tasks) and the intermittent failure is now fixed.
   
   Please see below for Travis build with 72 compaction integration test.
   - Without any change from this PR. We can see that about 25% failed due to 
the above issue. https://travis-ci.org/github/maytasm/druid/builds/735867518
   - Removing the test that submits range partitioning compaction tasks with 2 
subtasks each (for a total of 6 tasks). We can see that 100% passed. 
https://travis-ci.org/github/maytasm/druid/builds/736172442
   - Changing the range partitioning compaction tasks to 1 subtask each. We can 
see that 100% passed. 
https://travis-ci.org/github/maytasm/druid/builds/736451204
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   
   - Fix compaction integration test CI timeout
   - Update Integration test README on root cause of the test failure.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to