hzhaop commented on code in PR #777:
URL: https://github.com/apache/skywalking-java/pull/777#discussion_r2516507523
##########
apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/remote/GRPCChannelManager.java:
##########
@@ -184,21 +184,65 @@ public Channel getChannel() {
*/
public void reportError(Throwable throwable) {
if (isNetworkError(throwable)) {
- reconnect = true;
- notify(GRPCChannelStatus.DISCONNECT);
+ triggerReconnect();
}
}
private void notify(GRPCChannelStatus status) {
- for (GRPCChannelListener listener : listeners) {
- try {
- listener.statusChanged(status);
- } catch (Throwable t) {
- LOGGER.error(t, "Fail to notify {} about channel connected.",
listener.getClass().getName());
+ synchronized (listeners) {
+ for (GRPCChannelListener listener : listeners) {
+ try {
+ listener.statusChanged(status);
+ } catch (Throwable t) {
+ LOGGER.error(t, "Fail to notify {} about channel
connected.", listener.getClass().getName());
+ }
}
}
}
+ /**
+ * Create a new gRPC channel to the specified server and reset connection
state.
+ */
+ private void createNewChannel(String host, int port) throws Exception {
+ if (managedChannel != null) {
+ managedChannel.shutdownNow();
+ }
+
+ managedChannel = GRPCChannel.newBuilder(host, port)
+ .addManagedChannelBuilder(new
StandardChannelBuilder())
+ .addManagedChannelBuilder(new
TLSChannelBuilder())
+ .addChannelDecorator(new
AgentIDDecorator())
+ .addChannelDecorator(new
AuthenticationDecorator())
+ .build();
+
+ // Reset reconnectCount after actually rebuilding the channel
+ reconnectCount = 0;
+ notifyConnected();
+ }
+
+ /**
+ * Trigger reconnection by setting reconnect flag and notifying listeners.
+ */
+ private void triggerReconnect() {
+ synchronized (statusLock) {
+ reconnect = true;
+ notify(GRPCChannelStatus.DISCONNECT);
+ }
+ }
+
+ /**
+ * Notify listeners that connection is established without resetting
reconnectCount.
+ * This is used when the channel appears connected but we want to keep
monitoring
+ * reconnect attempts in case it's a false positive (half-open connection).
+ */
+ private void notifyConnected() {
+ synchronized (statusLock) {
+ // Don't reset reconnectCount - connection might still be half-open
Review Comment:
I haven't found a reliable way to directly detect half-open connections
before they cause issues. `isConnected()` can return READY even when the
connection is actually half-open on the server side.
**What I observed:**
When a half-open connection exists, gRPC's keepalive mechanism will
eventually detect it - the keepalive PING will fail and trigger an UNAVAILABLE
exception. However, this detection can take time (depending on keepalive
settings).
**The change:**
The key issue with the original code was that `reconnectCount` was reset
whenever `isConnected()` returned true:
```java
// Original code
} else if (managedChannel.isConnected(++reconnectCount > 5)) {
reconnectCount = 0; // Reset here - problem!
reconnect = false;
}
```
When `isConnected()` returns true (even for a half-open connection), the
counter resets, so it never reaches the threshold to force a channel rebuild.
**Now:**
```java
if (reconnectCount > Config.Agent.FORCE_RECONNECTION_PERIOD) {
createNewChannel(...); // Force rebuild
} else if (managedChannel.isConnected(false)) {
notifyConnected(); // Don't reset reconnectCount
}
```
`reconnectCount` only resets in `createNewChannel()` after actually
rebuilding the channel. This ensures that even if `isConnected()` gives false
positives, we'll eventually force a rebuild.
Do you think there's a better approach to detect half-open connections
directly?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]