maytasm opened a new pull request #9724: Add integration tests for kafka ingestion URL: https://github.com/apache/druid/pull/9724 Add integration tests for kafka ingestion ### Description This PR adds the same coverage of integration test for Kafka as what we already have for Kinesis. Just to remind everyone...the coverage of integration test for Kafka and Kinesis we have now are: 1. Functional tests when Druid and Kafka/Kinesis are in stable state - legacy parser - inputFormat - Greater than 1 taskCount 2. Functional tests when Druid is in an unstable state - losing nodes - Stop/start supervisor 3. Functional tests when Kafka/Kinesis is in an unstable state - adding partitions - removing partitions (This is only for Kinesis as we cannot decrease partition for kafka) To verify ingestion: - Kafka/Kinesis lag should be minimal, the consumer should be able to pull off the queue at a comparable rate to the producer. - Realtime queries works from the indexing tasks - Queries works reading from historical segments (after handed off) - Queries return expected count/value/etc. Additionally, for Kafka we have the above set of coverage for both when Kafka producer has transaction enabled and transaction disabled. Note that this PR includes all existing Kafka integration tests for Kafka. The original tests and the location of the new test methods after this PR refactoring are: - transaction enabled + legacy parser is now in ITKafkaIndexingServiceTransactionalParallelizedTest#testKafkaIndexDataWithLegacyParserStableState - transaction disabled + legacy parser is now in ITKafkaIndexingServiceNonTransactionalParallelizedTest#testKafkaIndexDataWithLegacyParserStableState - transaction enabled + inputFormat parser is now in ITKafkaIndexingServiceTransactionalParallelizedTest#testKafkaIndexDataWithInputFormatStableState - transaction disabled + inputFormat parser is now in ITKafkaIndexingServiceNonTransactionalParallelizedTest#testKafkaIndexDataWithInputFormatStableState Other important change in this PR... - KinesisAdminClient - Create stream, delete stream, etc. for Kafka - KafkaEventWriter - Wrapper around Kafka producer for generating test data - Added useful mvn command flags for integration tests development and debugging - Added the framework/functionality for easily enabling certain test classes/packages to be run in parallel (use testng parallel framework by executing multiple test methods concurrently using multiple threads). This is added in this PR as all the Kafka integration tests (new+old) when run serially takes 1 hour to 1 hour 15 minutes. Without the parallel functionality, the kafka-index test groups will be the bottleneck our travis CI. Also, splitting up kafka-index doesn't really make sense as logically they are testing the same functionality. (Note: tests take long time due to lots of idle time such as waiting for supervisor to come up after restart, druid node to come up after restart, ingestion tasks to ingest data after insert into stream, etc.) - Upgraded testng to 6.14.3 due to testng bug for parallel test execution in the version we were using (https://github.com/cbeust/testng/issues/1660) - Added a new test tag in integration-tests/src/test/resources/testng.xml for including tests that can be run in parallel. - Removed DruidTestRunnerFactory and moved the setup/teardown logic into SuiteListener. The runner is executed for each test tag while the SuiteListener is executed for each suites. We do not want the teardown to happen after the first test tag as the second test tag will fail to run. (we now have two test tag, the serialized tests and the parallelized tests). This PR has: - [x] been self-reviewed. - [x] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [x] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths. - [x] added integration tests. - [x] been tested in a test Druid cluster.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
