Hi all, We just ran into an interesting issue. We are using grpc-go for both the client and server implementation. There are two instance of the server deployed for HA. Clients use dns name lookup and usually are split evenly between the two servers.
One of the servers had a network issue and wasn't reachable (we were able to simulate this situation by adding an iptables rule to drop packets destined to one of the two servers). The DNS server immediately detect that one of the servers isn't reachable and removes it from the pool. What we observed is that clients connected to that instance will keep getting "context deadline exceeded" errors for about 15 minutes. The tcpdump show multiple retransmission attempts. The client will eventually (after ~15 minutes) reconnect to the healthy instance. Is there a way to speed up the fail over without changing the number of TCP retransmissions in `/proc/sys/net/ipv4/tcp_retries2' ? Thanks, JS -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/875zwrojam.fsf%40gmail.com. For more options, visit https://groups.google.com/d/optout.
