I have found that GODEBUG=schedtrace still prints when the program is
frozen but I cannot find much documentation on deciphering its output. The
full output is here:
https://gist.github.com/immesys/7f213c63c54ce60ba72d1cbc9ffcc4cb but here
are the first few lines. It looks like this is garbage collector related?

SCHED 111364ms: gomaxprocs=40 idleprocs=0 threads=78 spinningthreads=0
idlethreads=65 runqueue=31 gcwaiting=1 nmidlelocked=1 stopwait=1
sysmonwait=0
P0: status=3 schedtick=36699 syscalltick=29158 m=-1 runqsize=0 gfreecnt=45
P1: status=3 schedtick=37370 syscalltick=30563 m=-1 runqsize=0 gfreecnt=31
P2: status=3 schedtick=36650 syscalltick=29478 m=-1 runqsize=0 gfreecnt=17
P3: status=3 schedtick=42395 syscalltick=34028 m=-1 runqsize=0 gfreecnt=38
P4: status=3 schedtick=40531 syscalltick=33546 m=-1 runqsize=0 gfreecnt=27
P5: status=3 schedtick=41369 syscalltick=31884 m=-1 runqsize=0 gfreecnt=19
P6: status=3 schedtick=39302 syscalltick=32162 m=-1 runqsize=0 gfreecnt=39
P7: status=3 schedtick=39739 syscalltick=32624 m=-1 runqsize=0 gfreecnt=34
P8: status=3 schedtick=32585 syscalltick=24358 m=-1 runqsize=0 gfreecnt=45
P9: status=3 schedtick=40376 syscalltick=30399 m=-1 runqsize=0 gfreecnt=27
P10: status=3 schedtick=31018 syscalltick=22164 m=-1 runqsize=0 gfreecnt=40
P11: status=3 schedtick=32043 syscalltick=24479 m=-1 runqsize=0 gfreecnt=53
P12: status=3 schedtick=36459 syscalltick=25222 m=-1 runqsize=0 gfreecnt=21
P13: status=3 schedtick=38445 syscalltick=28317 m=-1 runqsize=0 gfreecnt=38
P14: status=3 schedtick=39533 syscalltick=28189 m=-1 runqsize=0 gfreecnt=29
P15: status=3 schedtick=37009 syscalltick=27231 m=-1 runqsize=0 gfreecnt=29
P16: status=3 schedtick=36957 syscalltick=26211 m=44 runqsize=0 gfreecnt=45
P17: status=3 schedtick=35632 syscalltick=26481 m=-1 runqsize=0 gfreecnt=48
P18: status=3 schedtick=36228 syscalltick=24751 m=-1 runqsize=0 gfreecnt=59
P19: status=3 schedtick=32479 syscalltick=22970 m=-1 runqsize=0 gfreecnt=48
P20: status=3 schedtick=34090 syscalltick=25864 m=-1 runqsize=0 gfreecnt=12
P21: status=3 schedtick=35409 syscalltick=24369 m=-1 runqsize=0 gfreecnt=6
P22: status=3 schedtick=42235 syscalltick=28607 m=-1 runqsize=0 gfreecnt=16
P23: status=3 schedtick=29733 syscalltick=19429 m=-1 runqsize=0 gfreecnt=43
P24: status=3 schedtick=34757 syscalltick=22141 m=-1 runqsize=0 gfreecnt=26
P25: status=3 schedtick=31434 syscalltick=21767 m=-1 runqsize=0 gfreecnt=23
P26: status=3 schedtick=31597 syscalltick=21657 m=-1 runqsize=0 gfreecnt=53
P27: status=3 schedtick=32539 syscalltick=22653 m=-1 runqsize=0 gfreecnt=34
P28: status=3 schedtick=29503 syscalltick=20588 m=-1 runqsize=0 gfreecnt=28
P29: status=3 schedtick=30842 syscalltick=20199 m=-1 runqsize=0 gfreecnt=41
P30: status=3 schedtick=29787 syscalltick=19492 m=-1 runqsize=0 gfreecnt=15
P31: status=3 schedtick=28226 syscalltick=19101 m=-1 runqsize=0 gfreecnt=32
P32: status=3 schedtick=34254 syscalltick=24135 m=-1 runqsize=0 gfreecnt=27
P33: status=3 schedtick=34567 syscalltick=22554 m=-1 runqsize=0 gfreecnt=36
P34: status=3 schedtick=30416 syscalltick=19722 m=-1 runqsize=0 gfreecnt=33
P35: status=1 schedtick=19536 syscalltick=13084 m=0 runqsize=0 gfreecnt=50
P36: status=3 schedtick=29336 syscalltick=19015 m=-1 runqsize=0 gfreecnt=28
P37: status=3 schedtick=29794 syscalltick=19499 m=-1 runqsize=0 gfreecnt=40
P38: status=3 schedtick=31748 syscalltick=20359 m=-1 runqsize=0 gfreecnt=35
P39: status=3 schedtick=31851 syscalltick=20260 m=-1 runqsize=0 gfreecnt=46



On Thu, Feb 22, 2018 at 8:47 PM, Michael Andersen <mich...@steelcode.com>
wrote:

> Hi
>
> Thanks for your suggestions. This freeze happens to coincide with a larger
> number of cgo calls which in turn most likely do blocking read/write from
> sockets. How are those treated by the scheduler?
>
> I am not doing anything FUSE related, nor do I have assembly code. The
> only "interesting" stuff is the use of a c library - librados.
>
> In the goroutine dump from SIGQUIT, some stacks are followed by register
> dumps. Are those the only ones currently running? All of those are in
> runtime: 16 are in runtime.futex, 2 are in runtime.notetsleepg and one in
> runtime.gopark, which would make it seem like it was not an un-preemptable
> tight loop, but I am not sure how to parse the SIGQUIT output.
>
> Thanks
> Michael
>
>
>> I don't know what is happening with your program.
>>
>> This kind of thing can happen if you have a goroutine that is running
>> in a tight loop with no function calls or memory allocations.  The
>> current Go scheduler is non-preemptive, meaning that nothing will stop
>> that loop.  If that loop occurs while holding a lock, it could block
>> the entire rest of the program from running.  However, you would see
>> this in the stack trace.  This problem with the current scheduler is
>> https://golang.org/issue/10958.
>>
>> This kind of thing can happen if you are using an in-process FUSE file
>> system implemented by goroutines in your program.  The Go runtime
>> believes that some system calls, such as pipe or socket, never block.
>> If you have somehow set things up so that those system calls enter
>> your FUSE file system and depend on some other goroutine running, it
>> is possible that that goroutine will never be scheduled.  I don't know
>> of any bugs like this at present but they have existed in the past.
>> Of course if you aren't using a FUSE file system then this is not the
>> problem.
>>
>> This kind of thing can happen if you use assembler code to do a
>> blocking operation, or if you use syscall.Rawsyscall to call a system
>> call that blocks.  That can confuse the scheduler and lead to a
>> deadlock.  Don't do that.
>>
>> None of these are likely, but you asked for suggestions, and that's
>> what I've come up with.
>>
>> Ian
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to