hzhaop commented on code in PR #777:
URL: https://github.com/apache/skywalking-java/pull/777#discussion_r2516507523


##########
apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/remote/GRPCChannelManager.java:
##########
@@ -184,21 +184,65 @@ public Channel getChannel() {
      */
     public void reportError(Throwable throwable) {
         if (isNetworkError(throwable)) {
-            reconnect = true;
-            notify(GRPCChannelStatus.DISCONNECT);
+            triggerReconnect();
         }
     }
 
     private void notify(GRPCChannelStatus status) {
-        for (GRPCChannelListener listener : listeners) {
-            try {
-                listener.statusChanged(status);
-            } catch (Throwable t) {
-                LOGGER.error(t, "Fail to notify {} about channel connected.", 
listener.getClass().getName());
+        synchronized (listeners) {
+            for (GRPCChannelListener listener : listeners) {
+                try {
+                    listener.statusChanged(status);
+                } catch (Throwable t) {
+                    LOGGER.error(t, "Fail to notify {} about channel 
connected.", listener.getClass().getName());
+                }
             }
         }
     }
 
+    /**
+     * Create a new gRPC channel to the specified server and reset connection 
state.
+     */
+    private void createNewChannel(String host, int port) throws Exception {
+        if (managedChannel != null) {
+            managedChannel.shutdownNow();
+        }
+
+        managedChannel = GRPCChannel.newBuilder(host, port)
+                                    .addManagedChannelBuilder(new 
StandardChannelBuilder())
+                                    .addManagedChannelBuilder(new 
TLSChannelBuilder())
+                                    .addChannelDecorator(new 
AgentIDDecorator())
+                                    .addChannelDecorator(new 
AuthenticationDecorator())
+                                    .build();
+
+        // Reset reconnectCount after actually rebuilding the channel
+        reconnectCount = 0;
+        notifyConnected();
+    }
+
+    /**
+     * Trigger reconnection by setting reconnect flag and notifying listeners.
+     */
+    private void triggerReconnect() {
+        synchronized (statusLock) {
+            reconnect = true;
+            notify(GRPCChannelStatus.DISCONNECT);
+        }
+    }
+
+    /**
+     * Notify listeners that connection is established without resetting 
reconnectCount.
+     * This is used when the channel appears connected but we want to keep 
monitoring
+     * reconnect attempts in case it's a false positive (half-open connection).
+     */
+    private void notifyConnected() {
+        synchronized (statusLock) {
+            // Don't reset reconnectCount - connection might still be half-open

Review Comment:
   I haven't found a reliable way to directly detect half-open connections 
before they cause issues. `isConnected()` can return READY even when the 
connection is actually half-open on the server side.
   
   **What I observed:**
   When a half-open connection exists, gRPC's keepalive mechanism will 
eventually detect it - the keepalive PING will fail and trigger an UNAVAILABLE 
exception. However, this detection can take time (depending on keepalive 
settings).
   
   **The change:**
   The key issue with the original code was that `reconnectCount` was reset 
whenever `isConnected()` returned true:
   ```java
   // Original code
   } else if (managedChannel.isConnected(++reconnectCount > 5)) {
       reconnectCount = 0;  // Reset here - problem!
       reconnect = false;
   }
   ```
   
   When `isConnected()` returns true (even for a half-open connection), the 
counter resets, so it never reaches the threshold to force a channel rebuild.
   
   **Now:**
   ```java
   if (reconnectCount > Config.Agent.FORCE_RECONNECTION_PERIOD) {
       createNewChannel(...);  // Force rebuild
   } else if (managedChannel.isConnected(false)) {
       notifyConnected();  // Don't reset reconnectCount
   }
   ```
   
   `reconnectCount` only resets in `createNewChannel()` after actually 
rebuilding the channel. This ensures that even if `isConnected()` gives false 
positives, we'll eventually force a rebuild.
   
   Do you think there's a better approach to detect half-open connections 
directly?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to