Re: [PR] KAFKA-15890: Consumer.poll with long timeout unaware of assigned partitions [kafka]

via GitHub Thu, 30 Nov 2023 12:25:43 -0800


kirktrue commented on code in PR #14835:
URL: https://github.com/apache/kafka/pull/14835#discussion_r1411218465



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/CommitRequestManager.java:
##########
@@ -800,6 +811,11 @@ public void resetTimer() {
             this.timer.reset(autoCommitInterval);
         }
 
+        public long remainingMs(final long currentTimeMs) {
+            this.timer.update(currentTimeMs);
+            return this.timer.remainingMs();
+        }
+

Review Comment:
   Example, this is updating the Timer from both threads.



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/MembershipManagerImpl.java:
##########
@@ -1074,6 +1074,7 @@ boolean reconciliationInProgress() {
     public void onUpdate(ClusterResource clusterResource) {
         resolveMetadataForUnresolvedAssignment();
         if (!assignmentReadyToReconcile.isEmpty()) {
+            transitionTo(MemberState.RECONCILING);

Review Comment:
   OK, I think I'm running into this too. In my tests, the state machine was 
preemptively `RECONCILING` only to later find out the assignments were the 
same, after which it returned from `reconcile()` without resetting the state.



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java:
##########
@@ -205,6 +205,26 @@ public void wakeup() {
             networkClientDelegate.wakeup();
     }
 
+    /**
+     * Returns the delay for which the application thread can safely wait 
before it should be responsive
+     * to results from the request managers. For example, the subscription 
state can change when heartbeats
+     * are sent, so blocking for longer than the heartbeat interval might mean 
the application thread is not
+     * responsive to changes.
+     *
+     * @return The maximum delay in milliseconds
+     */
+    public long maximumTimeToWait() {
+        final long currentTimeMs = time.milliseconds();
+        if (requestManagers == null) {
+            return MAX_POLL_TIMEOUT_MS;
+        }
+        return requestManagers.entries().stream()
+                .filter(Optional::isPresent)
+                .map(Optional::get)
+                .map(rm -> rm.maximumTimeToWait(currentTimeMs))
+                .reduce(Long.MAX_VALUE, Math::min);
+    }
+

Review Comment:
   This will invoke the request managers directly from the application thread, 
right?
   
   If so, I'm a little concerned by this approach because the implementation of 
`maximumTimeToWait` in each `RequestManager` will be executed by the 
application thread. In the implementations elsewhere in this PR we're reading 
and writing state in the request managers that is only intended to be read or 
written by the network I/O thread.
   
   We need to be careful here 🤔 



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##########
@@ -417,4 +394,127 @@ private void updateHeartbeatIntervalMs(final long 
heartbeatIntervalMs) {
             this.heartbeatTimer.updateAndReset(heartbeatIntervalMs);
         }
     }
+
+    /**
+     * Builds the heartbeat requests correctly, ensuring that all information 
is sent according to
+     * the protocol, but subsequent requests do not send information which has 
not changed. This
+     * is important to ensure that reconciliation completes successfully.
+     */
+    static class HeartbeatState {

Review Comment:
   Fair enough 😄 



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java:
##########
@@ -205,6 +205,26 @@ public void wakeup() {
             networkClientDelegate.wakeup();
     }
 
+    /**
+     * Returns the delay for which the application thread can safely wait 
before it should be responsive
+     * to results from the request managers. For example, the subscription 
state can change when heartbeats
+     * are sent, so blocking for longer than the heartbeat interval might mean 
the application thread is not
+     * responsive to changes.
+     *
+     * @return The maximum delay in milliseconds
+     */
+    public long maximumTimeToWait() {
+        final long currentTimeMs = time.milliseconds();
+        if (requestManagers == null) {
+            return MAX_POLL_TIMEOUT_MS;
+        }
+        return requestManagers.entries().stream()
+                .filter(Optional::isPresent)
+                .map(Optional::get)
+                .map(rm -> rm.maximumTimeToWait(currentTimeMs))
+                .reduce(Long.MAX_VALUE, Math::min);
+    }
+

Review Comment:
   This will invoke the request managers directly from the application thread, 
right?
   
   If so, I'm a little concerned by this approach because the implementation of 
`maximumTimeToWait` in each `RequestManager` will be executed by the 
application thread. In the implementations elsewhere in this PR we're reading 
and writing state in the request managers that is only intended to be read or 
written by the network I/O thread.
   
   We need to be careful here 🤔 



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##########
@@ -417,4 +394,127 @@ private void updateHeartbeatIntervalMs(final long 
heartbeatIntervalMs) {
             this.heartbeatTimer.updateAndReset(heartbeatIntervalMs);
         }
     }
+
+    /**
+     * Builds the heartbeat requests correctly, ensuring that all information 
is sent according to
+     * the protocol, but subsequent requests do not send information which has 
not changed. This
+     * is important to ensure that reconciliation completes successfully.
+     */
+    static class HeartbeatState {

Review Comment:
   Fair enough 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-15890: Consumer.poll with long timeout unaware of assigned partitions [kafka]

Reply via email to