Re: [PR] [FLINK-38681] Revise threading model in enumerator [flink-connector-kafka]

via GitHub Tue, 18 Nov 2025 11:07:48 -0800


Savonitar commented on code in PR #195:
URL: 
https://github.com/apache/flink-connector-kafka/pull/195#discussion_r2535143513



##########
flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumerator.java:
##########
@@ -587,7 +627,7 @@ public static class PartitionOffsetsRetrieverImpl
         private final String groupId;
 
         public PartitionOffsetsRetrieverImpl(AdminClient adminClient, String 
groupId) {
-            this.adminClient = adminClient;
+            this.adminClient = checkNotNull(adminClient);

Review Comment:
   (irrelevant to this PR): 
   I noticed that `PartitionOffsetsRetrieverImpl` closes the `adminClient` in 
`close()` method. However, `PartitionOffsetsRetrieverImpl` doesn't "own" the 
`adminClient`, it is created and owned by `KafkaSourceEnumerator` which also 
closes it in itsown `close()` method. 
   
   Should we either remove the `close()` implementation from 
`PartitionOffsetsRetrieverImpl` (make it a no-op), OR remove `AutoCloseable` if 
it's not needed?  (probably in another PR).
   
   Looks like currently it's not an issue because 
`PartitionOffsetsRetrieverImpl` are never closed, but it's a potential future 
bug, for example, if someone adds the usage with try-with-resources.



##########
flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumerator.java:
##########
@@ -331,34 +323,50 @@ private PartitionSplitChange findNewPartitionSplits() {
      *     partitions
      */
     private PartitionSplitChange initializePartitionSplits(PartitionChange 
partitionChange) {
-        Set<TopicPartition> newPartitions =
-                
Collections.unmodifiableSet(partitionChange.getNewPartitions());
+        Set<TopicPartition> newPartitions = partitionChange.getNewPartitions();
+        Set<TopicPartition> initialPartitions = 
partitionChange.getInitialPartitions();
 
-        OffsetsInitializer.PartitionOffsetsRetriever offsetsRetriever = 
getOffsetsRetriever();
         // initial partitions use OffsetsInitializer specified by the user 
while new partitions use
         // EARLIEST
-        final OffsetsInitializer initializer;
-        if (!initialDiscoveryFinished) {
-            initializer = startingOffsetInitializer;
-        } else {
-            initializer = newDiscoveryOffsetsInitializer;
-        }
         Map<TopicPartition, Long> startingOffsets =
-                initializer.getPartitionOffsets(newPartitions, 
offsetsRetriever);
-
+                newLinkedHashMapWithExpectedSize(newPartitions.size() + 
initialPartitions.size());
         Map<TopicPartition, Long> stoppingOffsets =
-                stoppingOffsetInitializer.getPartitionOffsets(newPartitions, 
offsetsRetriever);
+                newLinkedHashMapWithExpectedSize(newPartitions.size() + 
initialPartitions.size());
+        if (!newPartitions.isEmpty()) {
+            initOffsets(
+                    newPartitions,
+                    newDiscoveryOffsetsInitializer,
+                    startingOffsets,
+                    stoppingOffsets);
+        }
+        if (!initialPartitions.isEmpty()) {
+            initOffsets(
+                    initialPartitions, startingOffsetInitializer, 
startingOffsets, stoppingOffsets);
+        }
 
         Set<KafkaPartitionSplit> partitionSplits = new 
HashSet<>(newPartitions.size());

Review Comment:
   Shouldn't it be 
   `newPartitions.size() + initialPartitions.size()`
   ?



##########
flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/split/KafkaPartitionSplit.java:
##########
@@ -123,6 +123,10 @@ public static String toSplitId(TopicPartition tp) {
         return tp.toString();
     }
 
+    public boolean hasValidStartOffset() {

Review Comment:
   A minor suggestion. Maybe it's better to rename the method to
   ```
   public boolean isMigrated() {
       return startingOffset == MIGRATED;
   }
   ```
   
   This method could then be used in the following two places:
   1. In 
`org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator#start`
   ```
   final List<KafkaPartitionSplit> preinitializedSplits =
       unassignedSplits.values().stream()
           .filter(split -> !split.isMigrated())
           .collect(Collectors.toList());
   ```
   
   2. In 
`org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator#getPartitionChange`
   ```
   if (split.isMigrated()) {
       initialPartitions.add(split.getTopicPartition());
   }
   ```
   
   By using `isMigrated()` instead of mixing the direct constant check and 
`hasValidStartOffset()`, the code becomes consistent and relies only on a 
helpful encapsulation method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-38681] Revise threading model in enumerator [flink-connector-kafka]

Reply via email to