PatrickRen opened a new pull request #15531:
URL: https://github.com/apache/flink/pull/15531
## What is the purpose of the change
- This pull request refactors the logic of partition discovery in
```KafkaSourceEnumerator```, and changes the interface ```KafkaSubscriber``` to
separate topic description fetching and discover partition changes into two
methods.
## Brief change log
- Rework the logic of partition discovery. Currently the flow of partition
discovery is:
1. The worker thread fetches descriptions of subscribed topics/partitions,
then hands over to coordinator thread
2. The coordinator thread filters out already discovered partitions (pending
+ assigned partitions), then invokes worker thread with callAsync to fetch
offsets for new partitions
3. The worker thread fetches offsets and creates splits for new partitions,
then hands over new splits to coordinator thread
4. The coordinator thread marks these splits as pending and try to make
assignment.
- Add new method ```getSubscribedTopicDescriptions(AdminClient)``` in
```KafkaSubscriber``` for fetching subscribed topic descriptions
- Change method ```getPartitionChanges(AdminClient, Set)``` to
```getPartitionChanges(Map, Set)``` in ```KafkaSubscriber``` for coordinator
thread to discover partition changes
## Verifying this change
This change is already covered by existing tests of the Kafka Source.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]