aho135 commented on code in PR #19571: URL: https://github.com/apache/druid/pull/19571#discussion_r3384832372
########## docs/ingestion/kafka-ingestion.md: ########## @@ -263,6 +264,46 @@ The following example shows a supervisor spec with idle configuration enabled: ``` </details> +#### Partition filter dimensions + +When you set `partitionFilterDimensions` in the IO config, the supervisor tracks the distinct values observed for each listed dimension during ingestion. At segment publish time, each segment is annotated with only the values it actually ingested. The broker then uses these annotations to skip segments at query time when the query filter doesn't intersect the segment's declared values. + +This enables segment pruning for streaming-ingested data without waiting for compaction to produce hash or range-partitioned segments. + +**Usage guidelines:** + +- Use only low-to-medium cardinality dimensions (for example, `tenant_id`, `region`, `environment`). High-cardinality dimensions bloat segment metadata with no pruning benefit. +- Most effective when Kafka partitions are keyed by the tracked dimension (for example, using tenant ID as the message key). Each task naturally sees a subset of values, and segments get tight filter annotations. +- Also works with multiple supervisors reading from separate topics into one datasource. +- After compaction, the `StreamRangeShardSpec` annotations are replaced by the compaction output's shard spec (hash or range partitioning), which provides its own pruning. Review Comment: Maybe worth mentioning that when using `partitionFilterDimensions`, dynamic compaction strategy should not be used -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
