On Fri, Aug 4, 2017 at 1:27 AM, Konstantin Khomoutov <kos...@bswap.ru> wrote: > > We're experiencing a problem with our program which serves HTTP requests. > > Its clients have TCP connection timeouts set to 1 second, and under > certain pattern of heavy load the server fails to perform some > net.netFD.Accept() calls in time so a fraction of clients gets I/O > timeouts when attempting to connect. > > Thanks to the Go runtime tracing facility, we were able to pinpoint that > those I/O timeouts happen due to the goroutine which does the accept() call > being unblocked by the netpoller but then waits in a run queue to > actually have a chance to run for too long (in our case -- sometimes for > over two seconds). We hypothesize that this happens due to heavy load on > the goroutine scheduler.
Assuming you are correct--it sounds plausible--then another way to say this is that your server is overloaded. When a server is overloaded, it's generally best to stop accepting new work until some of the existing work is complete and the server has more capacity. You are suggesting doing exactly the opposite: accept new connections which means doing even more work. The overall effect will be that your server will get more and more overloaded until it eventually fails in some way. Of course, there may be something specific about your system that means that this will not happen. But the rule of thumb for an overloaded server is stop accepting new work, and force the clients to retry or to use a different server. > - Lock the goroutine which runs the net/http.Server instance to its > underlying OS thread (and may be do a syscall to raise that thread's > priority above normal). > > We did not yet test this approach but it appears to be better that the > first as it should lower possible contentions between multiple > instances of the net/http.Server type, and makes the whole setup > simpler. > > I'd like to solicit insight on whether the latter approach is workable. > > I sadly lack full understanding on how actually running of goroutines > and the runtime scheduler really interacts with the scheduling of the > threads done by the OS. For instance, whould the described setup do > anything to help the goroutine doing accept() have more execution > quanta? No, I don't think so. > I'm in doubt because of the interaction between the netpoller and the > goroutines. Say, a goroutine have itself locked to its underlying OS > thread, that thread has its OS-level priority raised. Now suppose that > goroutine does the accept() syscall; the socket backlog is zero, so the > syscall would block and so it goes to the netpoller and the goroutine > gets parked. Since it has its thread locked, my understanding is such > that thread gets essentially dormant (and the runtime is free to spawn > another one to fullfill GOMAXPROCS). Yes. > Now what happens in these two cases: > > - A client initiates the connection and the netpoller unblocks the > goroutine doing accept. > > As I understand, the runtime will merely figure out that that > particular goroutine must run on its dedicated thread, so it will make > its P execute it on that thread (hope I got the terminology right). > > Will the fact the goroutine was bound to its own thread help it get > executed with priority compared to other goroutines (which essencially > contend for GOMAXPROCS other threads)? No. > - What happens to the thread, to which a goroutine was locked, while > that goroutine remains parked? Is this thread somehow suspended and > the OS never wakes it until the goroutine gets unblocked? > Or is the OS free to schedule it, and if yes, what runs on it when it > gets its execution quantum? The thread blocks waiting for the G to become ready and for the scheduler to assign it a P. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.