This is an automated email from the ASF dual-hosted git repository.

abhishek pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
     new b97cc45d81 Add clarification to the docs for multi-topic Kafka 
ingestion (#14847)
b97cc45d81 is described below

commit b97cc45d81337392a449f32764346a5842f65c5b
Author: Abhishek Agarwal <[email protected]>
AuthorDate: Thu Aug 17 12:52:06 2023 +0530

    Add clarification to the docs for multi-topic Kafka ingestion (#14847)
    
    Follow-up to #14828. Added some more clarification about how topicPattern 
is used.
---
 .../extensions-core/kafka-supervisor-reference.md   | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/docs/development/extensions-core/kafka-supervisor-reference.md 
b/docs/development/extensions-core/kafka-supervisor-reference.md
index 0ec1ebd033..95d6b58018 100644
--- a/docs/development/extensions-core/kafka-supervisor-reference.md
+++ b/docs/development/extensions-core/kafka-supervisor-reference.md
@@ -37,8 +37,8 @@ This topic contains configuration reference information for 
the Apache Kafka sup
 
 |Field|Type|Description|Required|
 |-----|----|-----------|--------|
-|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use 
this setting when you want to ingest from a single kafka topic.|yes|
-|`topicPattern`|String|A regex pattern that can used to select multiple kafka 
topics to ingest data from. Either this or `topic` can be used in a spec. See 
[Ingesting from multiple topics](#ingesting-from-multiple-topics) for more 
details.|yes|
+|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use 
this setting when you want to ingest from a single Kafka topic.|yes, only if 
`topicPattern` is not set|
+|`topicPattern`|String|A regex pattern that can used to select multiple Kafka 
topics to ingest data from. Either this or `topic` can be used in a spec. See 
[Ingesting from multiple topics](#ingesting-from-multiple-topics) for more 
details.|yes, only if `topic` is not set|
 |`inputFormat`|Object|`inputFormat` to define input data parsing. See 
[Specifying data format](#specifying-data-format) for details about specifying 
the input format.|yes|
 |`consumerProperties`|Map<String, Object>|A map of properties to pass to the 
Kafka consumer. See [More on consumer 
properties](#more-on-consumerproperties).|yes|
 |`pollTimeout`|Long|The length of time to wait for the Kafka consumer to poll 
records, in milliseconds|no (default == 100)|
@@ -148,14 +148,15 @@ Multiple topics can be passed as a regex pattern as the 
value for `topicPattern`
 ingest data from clicks and impressions, you will set `topicPattern` to 
`clicks|impressions` in the IO config.
 Similarly, you can use `metrics-.*` as the value for `topicPattern` if you 
want to ingest from all the topics that
 start with `metrics-`. If new topics are added to the cluster that match the 
regex, Druid will automatically start
-ingesting from those new topics. If you enable multi-topic ingestion for a 
datasource, downgrading to a version
-lesser than 28.0.0 will cause the ingestion for that datasource to fail.
-
-When ingesting data from multiple topics, the partitions are assigned based on 
the hashcode of topic and the id of the 
-partition within that topic. The partition assignment might not be uniform 
across all the tasks. It's also assumed 
-that partitions across individual topics have similar load. It is recommended 
that you have a higher number of 
-partitions for a high load topic and a lower number of partitions for a low 
load topic. Assuming that you want to 
-ingest from both high and low load topic in the same supervisor. 
+ingesting from those new topics. A topic name that only matches partially such 
as `my-metrics-12` will not be
+included for ingestion. If you enable multi-topic ingestion for a datasource, 
downgrading to a version older than
+28.0.0 will cause the ingestion for that datasource to fail.
+
+When ingesting data from multiple topics, partitions are assigned based on the 
hashcode of the topic name and the
+id of the partition within that topic. The partition assignment might not be 
uniform across all the tasks. It's also
+assumed that partitions across individual topics have similar load. It is 
recommended that you have a higher number of
+partitions for a high load topic and a lower number of partitions for a low 
load topic. Assuming that you want to
+ingest from both high and low load topic in the same supervisor.
 
 ## More on consumerProperties
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to