jaykanakiya commented on code in PR #19596:
URL: https://github.com/apache/druid/pull/19596#discussion_r3440381993


##########
docs/ingestion/kafka-ingestion.md:
##########
@@ -273,6 +273,7 @@ This enables segment pruning for streaming-ingested data 
without waiting for com
 
 - Only string-typed dimensions are currently supported.
 - Use only low-to-medium cardinality dimensions (for example, `tenant_id`, 
`region`, `environment`). High-cardinality dimensions bloat segment metadata 
with no pruning benefit.
+- Set `maxValuesPerDimension` as a safety cap if a tracked dimension may 
unexpectedly grow high-cardinality. When a segment's observed distinct values 
for a dimension exceed the cap, that dimension is omitted from the segment's 
stamped filter map: pruning is disabled for that dimension on that segment, but 
other tracked dimensions continue to prune as normal. Default is unset 
(uncapped).

Review Comment:
   Updated the text. 



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/StreamingPartitionsSpec.java:
##########
@@ -41,13 +42,29 @@
 public class StreamingPartitionsSpec
 {
   private final List<String> partitionDimensions;
+  @Nullable
+  private final Integer maxValuesPerDimension;
 
   @JsonCreator
   public StreamingPartitionsSpec(
-      @JsonProperty("partitionDimensions") @Nullable List<String> 
partitionDimensions
+      @JsonProperty("partitionDimensions") @Nullable List<String> 
partitionDimensions,
+      @JsonProperty("maxValuesPerDimension") @Nullable Integer 
maxValuesPerDimension
   )
   {
     this.partitionDimensions = partitionDimensions == null ? 
Collections.emptyList() : partitionDimensions;
+    if (maxValuesPerDimension != null) {
+      Preconditions.checkArgument(
+          maxValuesPerDimension > 0,
+          "maxValuesPerDimension must be > 0, got [%s]",
+          maxValuesPerDimension
+      );
+    }

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to