Incidentally, it just happened again without involving the GC stop the world, where all 8 P's were running and had the same program counters for 4 seconds. They are the same as those in the previous email.
On Mon, Aug 26, 2019 at 2:11 PM Michael Andersen <mich...@steelcode.com> wrote: > Hi > > I was concerned that perhaps the stacks that I dumped after the scheduling > stall did not reflect the actual PC's of the goroutines during the stall, > so I modified schedtrace to dump out g.sched.pc and g.m.vdsoPC for each G > currently on the P at the time schedtrace prints. > > It just occurred again a few minutes ago. Previously, all the P's were > running the problematic code, so other things weren't scheduled. This time > it appears that only a few P's (4 of 8) were running the problematic code > but the GC was trying to stop the world so even though there are P's with > nothing on them, nothing else ran. All the PC's on the problematic > goroutines (those with status=_Prunning) were static for the 3 seconds they > stalled. They all had the same PC's: > > g.sched.pc was 0x43389f which maps to: Line 302 of > "/usr/local/go/src/runtime/proc.go" starts at address 0x43389f > <runtime.gopark+239> and ends at 0x4338b7 <runtime.gopark+263>. > g.m.vdsoPC was 0x450c68 which maps to: Line 17 of > "/usr/local/go/src/runtime/timestub.go" starts at address 0x450c63 > <time.now+51> and ends at 0x450c8d <time.now+93>. > > In case there is more useful information that I missed, I attached the log > output. > > I'm trying to make a simpler program that reproduces this, but thus far > running time.Now in a tight loop across multiple goroutines isn't causing > it. Any ideas what I could try next? > > Thanks > Michael > > > > On Sat, Aug 24, 2019 at 9:23 PM Michael Andersen <mich...@steelcode.com> > wrote: > >> >>> >>> > Out of interest what OS? >>> > Is the machine virtualised? >>> >> >> Yes, this on EC2, on m5.2xlarge (which are nitro instances with >> vDSO-supporting clock). I am running Linux 4.9.0 >> >> >>> Also, are you running a kernel with VDSO enabled? With VDSO, >>> `time.Now` should not be making any system calls. It's hard to >>> understand why it would block. Although the VDSO code does loop, and >>> perhaps virtualization affects that in some way that I'm not aware of. >>> >> >> I believe vDSO is enabled. I ran this check: >> >> cat /sys/devices/system/clocksource/clocksource0/current_clocksource >> kvm-clock >> strace -e gettimeofday,clock_gettime -- date >/dev/null >> +++ exited with 0 +++ >> >> and there are no syscalls observed by strace. >> >> >>> Does your code call `time.Now` a lot? >>> >> >> Yes, about 100k times a second. I'll look into reducing that, but even if >> that solves the problem, I would still be curious as to why it's happening. >> >> I'll dig into this a bit more on Monday. >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAPLpPrtWG%3DpAmWoO%2BZYYADocHQx8Ts1Ld1XZxF9xCM51n4aNsg%40mail.gmail.com.