Incidentally, it just happened again without involving the GC stop the
world, where all 8 P's were running and had the same program counters for 4
seconds. They are the same as those in the previous email.

On Mon, Aug 26, 2019 at 2:11 PM Michael Andersen <mich...@steelcode.com>
wrote:

> Hi
>
> I was concerned that perhaps the stacks that I dumped after the scheduling
> stall did not reflect the actual PC's of the goroutines during the stall,
> so I modified schedtrace to dump out g.sched.pc and g.m.vdsoPC for each G
> currently on the P at the time schedtrace prints.
>
> It just occurred again a few minutes ago. Previously, all the P's were
> running the problematic code, so other things weren't scheduled. This time
> it appears that only a few P's (4 of 8) were running the problematic code
> but the GC was trying to stop the world so even though there are P's with
> nothing on them, nothing else ran. All the PC's on the problematic
> goroutines (those with status=_Prunning) were static for the 3 seconds they
> stalled. They all had the same PC's:
>
> g.sched.pc was 0x43389f which maps to: Line 302 of
> "/usr/local/go/src/runtime/proc.go" starts at address 0x43389f
> <runtime.gopark+239> and ends at 0x4338b7 <runtime.gopark+263>.
> g.m.vdsoPC was 0x450c68 which maps to: Line 17 of
> "/usr/local/go/src/runtime/timestub.go" starts at address 0x450c63
> <time.now+51> and ends at 0x450c8d <time.now+93>.
>
> In case there is more useful information that I missed, I attached the log
> output.
>
> I'm trying to make a simpler program that reproduces this, but thus far
> running time.Now in a tight loop across multiple goroutines isn't causing
> it. Any ideas what I could try next?
>
> Thanks
> Michael
>
>
>
> On Sat, Aug 24, 2019 at 9:23 PM Michael Andersen <mich...@steelcode.com>
> wrote:
>
>>
>>>
>>> > Out of interest what OS?
>>> > Is the machine virtualised?
>>>
>>
>> Yes, this on EC2, on m5.2xlarge (which are nitro instances with
>> vDSO-supporting clock). I am running Linux 4.9.0
>>
>>
>>> Also, are you running a kernel with VDSO enabled?  With VDSO,
>>> `time.Now` should not be making any system calls.  It's hard to
>>> understand why it would block.  Although the VDSO code does loop, and
>>> perhaps virtualization affects that in some way that I'm not aware of.
>>>
>>
>> I believe vDSO is enabled. I ran this check:
>>
>> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>> kvm-clock
>> strace -e gettimeofday,clock_gettime -- date >/dev/null
>> +++ exited with 0 +++
>>
>> and there are no syscalls observed by strace.
>>
>>
>>> Does your code call `time.Now` a lot?
>>>
>>
>> Yes, about 100k times a second. I'll look into reducing that, but even if
>> that solves the problem, I would still be curious as to why it's happening.
>>
>> I'll dig into this a bit more on Monday.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAPLpPrtWG%3DpAmWoO%2BZYYADocHQx8Ts1Ld1XZxF9xCM51n4aNsg%40mail.gmail.com.

Reply via email to