greyp9 opened a new pull request, #8463:
URL: https://github.com/apache/nifi/pull/8463
# Background
Previous iterations of support for Kafka client versions in NiFi (1.0, 2.0,
2.6) duplicated code from existing Kafka processors into new Maven modules,
adjusted Kafka client library dependencies for the new modules, and adjusted
for API differences as needed. The original JIRA associated with NiFi support
for Kafka 3 (`NIFI-9330`), followed this same approach. After discussion, a new
approach was chosen, and a new JIRA (`NIFI-11259`) created.
- Refactor Kafka client library dependencies into a new controller service.
- Expose a service API that was agnostic of any particular Kafka version.
- Create new processor implementations that interacted with Kafka through
the service API.
In particular, the new processor module should have no Kafka dependencies.
This is expected to ease the transition to future Kafka versions.
- A new 3.0++ controller service might be created to isolate any major API
changes to the Kafka client APIs.
- The new `PublishKafka` and `ConsumeKafka` processors would need minimal /
no changes to enable interactivity with the new controller service.
Other refactoring activities have been undertaken at the same time:
- The new `PublishKafka` processor is intended as a replacement for the
existing `PublishKafka_2_6` and `PublishKafkaRecord_2_6` processor pair. It
provides FlowFile-based or record-based data handling modes, controlled via a
per-processor property. Similarly, `ConsumeKafka` replaces both
`ConsumeKafka_2_6` and `ConsumeKafkaRecord_2_6`. This design adjustment reduces
the code duplication that was present in the 2.6 processor set.
# New Modules
- `nifi-kafka-service-api` - API contract for `KafkaConnectionService`,
which exposes access to instances of `KafkaConsumerService` and
`KafkaProducerService` in a manner agnostic to a particular version of Kafka
- `KafkaProducerService` - intermediary logical service brokering
interactions with the producer APIs of the Kafka client libraries
- `KafkaConsumerService` - intermediary logical service brokering
interactions with the producer APIs of the Kafka client libraries
- `nifi-kafka-service-api-nar` - NiFi NAR wrapper for the
`KafkaConnectionService` API contract
- `nifi-kafka-3-service` - home for `Kafka3ConnectionService`, which
abstracts Kafka dependencies away from the new Kafka processors, and manages
runtime connections to a remote Kafka 3 service instance
- `nifi-kafka-3-service-nar` - NiFi NAR wrapper for `Kafka3ConnectionService`
- `nifi-kafka-processors` - home for `PublishKafka` and `ConsumeKafka`
processors, which allow interactivity with remote Kafka service instances
agnostic to a particular Kafka version
- `nifi-kafka-nar` - NiFi NAR wrapper for the `PublishKafka` and
`ConsumeKafka` processors
- `nifi-kafka-2-6-integration` - test bed to establish runtime behavior
(testcontainers/kafka) of Kafka 2.6 processors for certain conditions, in order
to better replicate those behaviors
- `nifi-kafka-3-integration` - testing infrastructure to test new processors
/ controller service while communicating with an actual (testcontainers) Kafka
instance
- `nifi-kafka-jacoco` - module home for configuration to instrument
executions of `nifi-kafka-bundle` unit tests / integration tests, in order to
assess test coverage
# Notes
- Integration tests are employed as a "first-class citizen" means of testing
expected interactions with Kafka instances, running in `testcontainers`.
- https://github.com/testcontainers/testcontainers-java
- It is possible to use a single instance of testcontainers/kafka per Maven
module, in order to incur the startup/teardown cost only once. The intent is to
employ this strategy where feasible.
- Instances of `simplelogger.properties` have been useful during
development, but these would be removed before PR merge.
- I’ve done a significant amount of runtime testing against containerized
Kafka instances using the repo/branch below. This resource may be useful for
others who want to do runtime testing without the need for fixed Kafka
infrastructure.
- https://github.com/greyp9/kafka-images/tree/NIFI-12194
- There are opportunities to refactor common methods and declarations into
the `nifi-kafka-shared` module; I’ve avoided that during development to ease
the process of rebasing to `nifi/main`.
- It is likely that this set of new components will co-exist with the
existing Kafka 2.6 based processors for some period of time.
- Allow for migration of existing flows to use the new components.
- Slight behavioral differences might be anticipated during the "burn in"
phase of the new components, due to the scope of work.
- Code compatibility with JRE 8 has been targeted, to leave open the option
of backporting this work to the support/1.x branch.
- The PR as is should support the following authentication strategies:
`PLAINTEXT`, `SSL`, `SASL_PLAINTEXT`, and `SASL_SSL`. Support for additional
authentication strategies could be handled via follow on JIRAs.
- Support for the `KafkaRecordSink` form factor could be handled via follow
on JIRAs.
- Support for Kafka "exactly once" semantics could be handled via follow on
JIRAs.
- Migration documentation could be handled via follow on JIRAs.
### Issue Tracking
- [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue
created
### Pull Request Tracking
- [x] Pull Request title starts with Apache NiFi Jira issue number, such as
`NIFI-00000`
- [x] Pull Request commit message starts with Apache NiFi Jira issue number,
as such `NIFI-00000`
### Pull Request Formatting
- [x] Pull Request based on current revision of the `main` branch
- [x] Pull Request refers to a feature branch with one commit containing
changes
# Verification
Please indicate the verification steps performed prior to pull request
creation.
### Build
- [x] Build completed using `mvn clean install -P contrib-check`
- [x] JDK 21
### Licensing
- [ ] New dependencies are compatible with the [Apache License
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License
Policy](https://www.apache.org/legal/resolved.html)
- [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE`
files
### Documentation
- [ ] Documentation formatting appears as expected in rendered files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]