maytasm opened a new pull request #9724: Add integration tests for kafka 
ingestion
URL: https://github.com/apache/druid/pull/9724
 
 
   Add integration tests for kafka ingestion
   
   ### Description
   
   This PR adds the same coverage of integration test for Kafka as what we 
already have for Kinesis. 
   Just to remind everyone...the coverage of integration test for Kafka and 
Kinesis we have now are:
   
   1. Functional tests when Druid and Kafka/Kinesis are in stable state
   - legacy parser
   - inputFormat
   - Greater than 1 taskCount
   
   2. Functional tests when Druid is in an unstable state 
   - losing nodes
   - Stop/start supervisor
   
   3. Functional tests when Kafka/Kinesis is in an unstable state 
   - adding partitions
   - removing partitions (This is only for Kinesis as we cannot decrease 
partition for kafka)
   
   To verify ingestion:
   - Kafka/Kinesis lag should be minimal, the consumer should be able to pull 
off the queue at a comparable rate to the producer.
   - Realtime queries works from the indexing tasks
   - Queries works reading from historical segments (after handed off)
   - Queries return expected count/value/etc.
   
   Additionally, for Kafka we have the above set of coverage for both when 
Kafka producer has transaction enabled and transaction disabled. Note that this 
PR includes all existing Kafka integration tests for Kafka. The original tests 
and the location of the new test methods after this PR refactoring are:
   - transaction enabled + legacy parser is now in 
ITKafkaIndexingServiceTransactionalParallelizedTest#testKafkaIndexDataWithLegacyParserStableState
   - transaction disabled + legacy parser is now in 
ITKafkaIndexingServiceNonTransactionalParallelizedTest#testKafkaIndexDataWithLegacyParserStableState
   - transaction enabled + inputFormat parser is now in 
ITKafkaIndexingServiceTransactionalParallelizedTest#testKafkaIndexDataWithInputFormatStableState
   - transaction disabled + inputFormat parser is now in 
ITKafkaIndexingServiceNonTransactionalParallelizedTest#testKafkaIndexDataWithInputFormatStableState
   
   Other important change in this PR...
   - KinesisAdminClient - Create stream, delete stream, etc. for Kafka
   - KafkaEventWriter - Wrapper around Kafka producer for generating test data
   - Added useful mvn command flags for integration tests development and 
debugging
   - Added the framework/functionality for easily enabling certain test 
classes/packages to be run in parallel (use testng parallel framework by 
executing multiple test methods concurrently using multiple threads). This is 
added in this PR as all the Kafka integration tests (new+old) when run serially 
takes 1 hour to 1 hour 15 minutes. Without the parallel functionality, the 
kafka-index test groups will be the bottleneck our travis CI. Also, splitting 
up kafka-index doesn't really make sense as logically they are testing the same 
functionality. (Note: tests take long time due to lots of idle time such as 
waiting for supervisor to come up after restart, druid node to come up after 
restart, ingestion tasks to ingest data after insert into stream, etc.)
       - Upgraded testng to 6.14.3 due to testng bug for parallel test 
execution in the version we were using 
(https://github.com/cbeust/testng/issues/1660)
       - Added a new test tag in 
integration-tests/src/test/resources/testng.xml for including tests that can be 
run in parallel.
       - Removed DruidTestRunnerFactory and moved the setup/teardown logic into 
SuiteListener. The runner is executed for each test tag while the SuiteListener 
is executed for each suites. We do not want the teardown to happen after the 
first test tag as the second test tag will fail to run. (we now have two test 
tag, the serialized tests and the parallelized tests).
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [x] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to