hzhaop opened a new pull request, #777: URL: https://github.com/apache/skywalking-java/pull/777
This commit addresses two issues related to gRPC connection stability and recovery. 1. **Half-open connections:** In unstable network environments, the agent could encounter half-open TCP connections where the server-side connection is terminated, but the client-side remains. This would cause the send-queue to grow indefinitely without automatic recovery. To resolve this, this change introduces gRPC keepalive probes. The agent will now send keepalive pings to the collector, ensuring that dead connections are detected and pruned in a timely manner. Two new configuration parameters, `collector.grpc_keepalive_time` and `collector.grpc_keepalive_timeout`, have been added to control this behavior.  3. **Reconnect logic:** The existing reconnection logic did not immediately re-establish a connection if the same backend instance was selected during a reconnect attempt. This could lead to a delay of up to an hour before the connection was re-established. The logic has been updated to ensure that the channel is always shut down and recreated, forcing an immediate reconnection attempt regardless of which backend is selected.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
