Hi all

I have a complex program that when under load will very reproducibly freeze 
every goroutine simultaneously. It then makes no progress at all, even if 
left for hours. I'm posting here because I don't know of anything that can 
cause this behavior so I don't even know where to begin debugging. When it 
"freezes", every goroutine appears to be no longer scheduled, no matter how 
simple. Even this at the start of main() ceases to print to stdout:

go func() {
for {
time.Sleep(3 * time.Second)
fmt.Printf("Still alive\n")
}
}()

The system is nowhere near OOM, the goroutine count is large but reasonable 
just before the freeze (<2k). After it freezes the process is still 
running, and attaching sysdig shows it is stuck spinning in futex, with 
only this showing up over and over:

637779 17:21:56.254826712 20 prog (43085) < futex res=-110(ETIMEDOUT) 
637782 17:21:56.254827305 20 prog (43085) > futex addr=10D5FA0 
op=0(FUTEX_WAIT) val=0 
637783 17:21:56.254828132 20 prog (43085) > switch next=0 pgft_maj=0 
pgft_min=60361 vm_size=20710168 vm_rss=10792276 vm_swap=0 

The "frozen" program still responds to SIGQUIT and dumps out the 
goroutines, but given that this is not a minimal reproducer (which I have 
not managed to make) I don't know which parts of that are useful. I put all 
of it 
here: https://gist.github.com/immesys/0b741e4ea18979614d8419fa9c007098 . 

My main question is what sort of bugs can cause the whole program to lock 
up? Even if some goroutines were deadlocked, why would that stop everything 
from net/http/pprof to a printf loop from working?

Some tidbits:

I have a core dump so I can inspect things with delve if I know what I am 
looking for
Building/running with -race doesn't print anything
I came across this 
(https://groups.google.com/forum/#!msg/golang-nuts/PMm8nH0yaoA/mb-cnKmZlb4J) 
which describes a similar occurency but I don't interact with syslog, at 
least not directly.
I am getting this on go 1.10 but I rebuilt on 1.9.4 and I get the same 
behavior.
I am on linux amd64 kernel 4.10
It only takes about two minutes to reproduce.
When frozen, only a single CPU core is pegged, the rest of the system is 
fine.

Any help at all would be appreciated, thanks
Michael



-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to