This is an automated email from the ASF dual-hosted git repository.
abhishek pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new b97cc45d81 Add clarification to the docs for multi-topic Kafka
ingestion (#14847)
b97cc45d81 is described below
commit b97cc45d81337392a449f32764346a5842f65c5b
Author: Abhishek Agarwal <[email protected]>
AuthorDate: Thu Aug 17 12:52:06 2023 +0530
Add clarification to the docs for multi-topic Kafka ingestion (#14847)
Follow-up to #14828. Added some more clarification about how topicPattern
is used.
---
.../extensions-core/kafka-supervisor-reference.md | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/docs/development/extensions-core/kafka-supervisor-reference.md
b/docs/development/extensions-core/kafka-supervisor-reference.md
index 0ec1ebd033..95d6b58018 100644
--- a/docs/development/extensions-core/kafka-supervisor-reference.md
+++ b/docs/development/extensions-core/kafka-supervisor-reference.md
@@ -37,8 +37,8 @@ This topic contains configuration reference information for
the Apache Kafka sup
|Field|Type|Description|Required|
|-----|----|-----------|--------|
-|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use
this setting when you want to ingest from a single kafka topic.|yes|
-|`topicPattern`|String|A regex pattern that can used to select multiple kafka
topics to ingest data from. Either this or `topic` can be used in a spec. See
[Ingesting from multiple topics](#ingesting-from-multiple-topics) for more
details.|yes|
+|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use
this setting when you want to ingest from a single Kafka topic.|yes, only if
`topicPattern` is not set|
+|`topicPattern`|String|A regex pattern that can used to select multiple Kafka
topics to ingest data from. Either this or `topic` can be used in a spec. See
[Ingesting from multiple topics](#ingesting-from-multiple-topics) for more
details.|yes, only if `topic` is not set|
|`inputFormat`|Object|`inputFormat` to define input data parsing. See
[Specifying data format](#specifying-data-format) for details about specifying
the input format.|yes|
|`consumerProperties`|Map<String, Object>|A map of properties to pass to the
Kafka consumer. See [More on consumer
properties](#more-on-consumerproperties).|yes|
|`pollTimeout`|Long|The length of time to wait for the Kafka consumer to poll
records, in milliseconds|no (default == 100)|
@@ -148,14 +148,15 @@ Multiple topics can be passed as a regex pattern as the
value for `topicPattern`
ingest data from clicks and impressions, you will set `topicPattern` to
`clicks|impressions` in the IO config.
Similarly, you can use `metrics-.*` as the value for `topicPattern` if you
want to ingest from all the topics that
start with `metrics-`. If new topics are added to the cluster that match the
regex, Druid will automatically start
-ingesting from those new topics. If you enable multi-topic ingestion for a
datasource, downgrading to a version
-lesser than 28.0.0 will cause the ingestion for that datasource to fail.
-
-When ingesting data from multiple topics, the partitions are assigned based on
the hashcode of topic and the id of the
-partition within that topic. The partition assignment might not be uniform
across all the tasks. It's also assumed
-that partitions across individual topics have similar load. It is recommended
that you have a higher number of
-partitions for a high load topic and a lower number of partitions for a low
load topic. Assuming that you want to
-ingest from both high and low load topic in the same supervisor.
+ingesting from those new topics. A topic name that only matches partially such
as `my-metrics-12` will not be
+included for ingestion. If you enable multi-topic ingestion for a datasource,
downgrading to a version older than
+28.0.0 will cause the ingestion for that datasource to fail.
+
+When ingesting data from multiple topics, partitions are assigned based on the
hashcode of the topic name and the
+id of the partition within that topic. The partition assignment might not be
uniform across all the tasks. It's also
+assumed that partitions across individual topics have similar load. It is
recommended that you have a higher number of
+partitions for a high load topic and a lower number of partitions for a low
load topic. Assuming that you want to
+ingest from both high and low load topic in the same supervisor.
## More on consumerProperties
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]