On Mon, Jul 1, 2019 at 12:42 PM 'Yunchi Luo' via golang-nuts < golang-nuts@googlegroups.com> wrote:
> Hello, I'd like to solicit some help with a weird GC issue we are seeing. > > I'm trying to debug OOM on a service we are running in k8s. The service is > just a CRUD server hitting a database (DynamoDB). Each replica serves about > 300 qps of traffic. There are no memory leaks. On occasion (seemingly > correlated to small latency spikes on the backend), the service would OOM. > This is surprising because it has a circuit breaker that drops requests > after 200 concurrent connections that has never trips, and goroutine > profiles confirm that there are nowhere 200 active goroutines. > Just curious about the network connections. Is there a chance that the network connections are not getting closed and cleaned up for some reason. It was common for sockets to hang around in the thousands because user was killing a slow tab or the browser and the full socket close never completed. The solution was to allow reliable connections to time out and finish closing freeing up the memory. The application has closed the socket but the protocol has yet to get the last packet to complete the handshake. The shell equivalent would be zombe processes that still need to return the exit status but no process waits on the status. Debugging can be interesting in the shell case because of implied waits done by ps. How many connections does the the system kernel think there are and what state are they are in. Look both locally and on the DB machine. The latency spikes can be a cause or a symptom. Look at the connections being made to the CRUD server and make sure they are being setup with short enough timers that they clean themselves up quickly enough. Is the CRUD server at risk of a denial of service or random curious probe burst from a nmap script. Even firewall drops near or far can leave connections hanging in an incomplete state when an invalid connection is detected and blocked and long timer reliable network connections are involved. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAAMy4UTgoxygmb5UE%2BtUhDgdkS--VmtbT5U8kF0aXQgwXPwA0w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.