Re: [PR] Add Kafka ingestion support for subset partitions [pinot]

via GitHub Wed, 28 Jan 2026 05:02:28 -0800


Copilot commented on code in PR #17587:
URL: https://github.com/apache/pinot/pull/17587#discussion_r2736534887



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/assignment/instance/InstanceReplicaGroupPartitionSelector.java:
##########
@@ -446,13 +478,16 @@ private void 
replicaGroupBasedMinimumMovement(Map<Integer, List<InstanceConfig>>
                   instanceToNumPartitionsMap.put(existingInstance, 
numPartitionsOnInstance + 1);
                 }
               }
+            } else {
+              partitionIdToExistingInstancesMap.add(List.of());
             }

Review Comment:
   The else clause initializing `partitionIdToExistingInstancesMap` with an 
empty list should have been added in the original loop structure starting at 
line 456. This duplicates the pattern from line 470 where existing instances 
are added conditionally. Consider moving this initialization to ensure the list 
size matches `partitionIds.size()` consistently throughout the method.



##########
pinot-plugins/pinot-stream-ingestion/pinot-kafka-3.0/src/main/java/org/apache/pinot/plugin/stream/kafka30/KafkaStreamMetadataProvider.java:
##########
@@ -96,6 +133,56 @@ public Set<Integer> fetchPartitionIds(long timeoutMillis) {
     }
   }
 
+  @Override
+  public List<PartitionGroupMetadata> computePartitionGroupMetadata(String 
clientId, StreamConfig streamConfig,
+      List<PartitionGroupConsumptionStatus> partitionGroupConsumptionStatuses, 
int timeoutMillis)
+      throws IOException, java.util.concurrent.TimeoutException {
+    Optional<List<Integer>> subsetOpt =
+        
KafkaPartitionSubsetUtils.getPartitionIdsFromConfig(_config.getStreamConfigMap());
+    if (subsetOpt.isEmpty()) {
+      return 
StreamMetadataProvider.super.computePartitionGroupMetadata(clientId, 
streamConfig,
+          partitionGroupConsumptionStatuses, timeoutMillis);
+    }
+    List<Integer> subset = subsetOpt.get();
+    Set<Integer> topicIds = fetchPartitionIds(timeoutMillis);
+    Map<Integer, StreamPartitionMsgOffset> consumptionByPartition = new 
HashMap<>();
+    for (PartitionGroupConsumptionStatus s : 
partitionGroupConsumptionStatuses) {
+      consumptionByPartition.put(s.getStreamPartitionGroupId(), 
s.getEndOffset());
+    }
+    StreamConsumerFactory streamConsumerFactory = 
StreamConsumerFactoryProvider.create(streamConfig);
+    List<PartitionGroupMetadata> result = new ArrayList<>(subset.size());
+    for (Integer partitionId : subset) {
+      if (!topicIds.contains(partitionId)) {
+        LOGGER.warn(
+            "Configured partition id {} does not exist in topic {} when 
computing partition group metadata. "
+                + "Topic partitions: {}",

Review Comment:
   The warning message in Kafka 3.0 is less detailed than Kafka 2.0 (lines 
157-160 in the 2.0 version). The Kafka 2.0 version includes additional context: 
"This indicates that topic partitions may have changed between validation and 
metadata computation. Skipping this partition." This extra explanation should 
be added to the Kafka 3.0 version for consistency and clarity.
   ```suggestion
                   + "Topic partitions: {}. This indicates that topic 
partitions may have changed between "
                   + "validation and metadata computation. Skipping this 
partition.",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add Kafka ingestion support for subset partitions [pinot]

Reply via email to