Nicholas Johnson created BEAM-13987:
---------------------------------------
Summary: Adding Regex based subscriptions to KafkaIo Dynamic Read.
Key: BEAM-13987
URL: https://issues.apache.org/jira/browse/BEAM-13987
Project: Beam
Issue Type: Improvement
Components: io-java-kafka
Reporter: Nicholas Johnson
In https://issues.apache.org/jira/browse/BEAM-11325 the ability for Kafka to
read from topics dynamically was added.
Along with this change, the ability to use regex to subscribe to topics in a
dynamic way was discussed in the design for this change.
[https://docs.google.com/document/d/1FU3GxVRetHPLVizP3Mdv6mP5tpjZ3fd99qNjUI5DT5k/edit?disco=AAAALGbMoak]
Pointing out the idea that subscribing to all topics in a cluster isn't
particularly useful for most kafka users, where pattern based subscription is a
very common pattern of use.
[https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#subscribe-java.util.regex.Pattern-]
[https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFn.java#L113]
The current implementation does not utilize a pattern in anyway to subscribe to
topics. The comments on the document mention piggy backing off the existing
functionality in the KafkaConsumers subscribe method.
However, piggy backing on the existing consumer method is made difficult by the
per partition subscription method used by beam.
But I believe a simple solution exists,
[https://github.com/apache/beam/blob/5345834a86c422347556e0bce7e5dd20e4854e44/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1183]
As apart of the with dynamic read method, allow the option to pass a Pattern.
As the watchkafkatopicpartitiondofn does now, call listTopics, and then match
against the list of topics using the supplied pattern.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)