I was wrong about the gc not getting memory back from the goroutines. I think it does get that through the gcResetMarkState function..... So I don't think the # of goroutines are the issue......I'm sorry if I misled you
On Tuesday, 30 April 2019 00:28:29 UTC-7, vaastav anand wrote: > > The stack trace only lists goroutines that are not dead/not system > goroutines/not the goroutine that is calling the traceback function. > (src/runtime/traceback.go) > Additionally, I don't think go reclaims any memory from dead goroutines. > allgs struct in src/runtime/proc.go file in the go source code holds all > the goroutines that have been created during the lifetime of the program > and it is all heap allocated. I don't know if the garbage collector > reclaims any of these dead goroutines. If it doesn't, which I don't think > it does because nothing ever seems to be removed from allgs. > > On Monday, 29 April 2019 23:54:54 UTC-7, Justin Israel wrote: >> >> >> >> On Tue, Apr 30, 2019 at 6:33 PM vaastav anand <vaastav...@gmail.com> >> wrote: >> >>> I have encountered a SIGBUS with go before but I was hacking inside the >>> runtime and using shared mem with mmap. >>> >>> goroutines are assigned IDs incrementally and each goroutine at bare >>> minimum has 2.1KB stack space in go1.11 down from 2.7KB in go1.10 if I >>> recall correctly. So, at the very least at that point you could have easily >>> burnt through at least 7.5GB of memory. I am not sure what could happen if >>> you somehow exceed the amount of memory available. Seems like that is a >>> test you could write and see if launching more goroutines than that could >>> fit in the size of memory could actually cause a SIGBUS. >>> >> >> The stack trace only listed 282 goroutines, which seems about right >> considering the number of clients that are connected. Its about 3 >> goroutines per client connection, plus the other stuff in the server. I >> think it just indicates that I have turned over a lot of client connections >> over time. >> >> >>> >>> On Monday, 29 April 2019 23:25:52 UTC-7, Justin Israel wrote: >>>> >>>> >>>> >>>> On Tue, Apr 30, 2019 at 6:09 PM vaastav anand <vaastav...@gmail.com> >>>> wrote: >>>> >>>>> Ok, so in the 2nd piece of code you posted, is some request being >>>>> pushed onto some OS queue? If so, is it possible that you may be maxing >>>>> the >>>>> queue out and then pushing something else into it and that could cause a >>>>> SIGBUS as well.... This seems super farfetched tho but it is hard to >>>>> debug >>>>> without really knowing what the application might really be doing. >>>>> >>>> >>>> I want to say that I really appreciate you taking the time to try and >>>> give me some possible ideas, even though this is a really vague problem. I >>>> had only hoped someone had encountered something similar. >>>> >>>> So that line in the SIGBUS crash is just trying to add a subscription >>>> to a message topic callback in the nats client connection: >>>> https://godoc.org/github.com/nats-io/go-nats#Conn.Subscribe >>>> It's pretty high level logic at my application level. >>>> >>>> One thing that stood out to me was that in the crash, the goroutine id >>>> number was 3538668. I had to double check to confirm that the go runtime >>>> just uses an insignificant increasing number. I guess it does indicate >>>> that >>>> the application turned over > 3 mil goroutines by that point. I'm >>>> wondering >>>> if this is caused by something in the gnatsd embedded server ( >>>> https://github.com/nats-io/gnatsd/tree/master/server) since most the >>>> goroutines do come from that, with all the client handling going on. If we >>>> are talking about something that is managing very large queues, that would >>>> be the one doing so in this application. >>>> >>>> >>>>> >>>>> On Monday, 29 April 2019 22:57:40 UTC-7, Justin Israel wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Apr 30, 2019 at 5:43 PM vaastav anand <vaastav...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I'd be very surprised if the anonymous goroutine is the reason >>>>>>> behind a SIGBUS violation. >>>>>>> So, if I remember SIGBUS correctly, it means that you are issuing a >>>>>>> read/write to a memory address which is not really addressable or it is >>>>>>> misaligned. I think the chances of the address being misaligned are >>>>>>> very >>>>>>> low.....so it really has to be a non-existent address. >>>>>>> It can happen if you have try to access memory outside the region >>>>>>> mmaped into your application. >>>>>>> If your application has any kind of mmap or shared memory access, I >>>>>>> would start there. >>>>>>> In any case your best bet is to somehow reproduce the bug >>>>>>> consistently and then figure out which memory access is causing the >>>>>>> fault. >>>>>>> >>>>>> >>>>>> My application isn't doing anything with mmap or shared memory, and >>>>>> my direct and indirect dependencies don't seem to be anything like that >>>>>> either. Its limited to pretty much nats.io client, gnatds embedded >>>>>> server, and a thrift rpc. >>>>>> >>>>>> It seems so random that I doubt I could get a reproducible crash. So >>>>>> I can really only try testing this on go 1.11 instead to see if any of >>>>>> the >>>>>> GC work in 1.12 causes this. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> On Monday, 29 April 2019 21:59:34 UTC-7, Justin Israel wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Thursday, November 29, 2018 at 6:22:56 PM UTC+13, Justin Israel >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Nov 29, 2018 at 6:20 PM Justin Israel <justin...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Thu, Nov 29, 2018 at 5:32 PM Ian Lance Taylor < >>>>>>>>>> ia...@golang.org> wrote: >>>>>>>>>> >>>>>>>>>>> On Wed, Nov 28, 2018 at 7:18 PM Justin Israel < >>>>>>>>>>> justin...@gmail.com> wrote: >>>>>>>>>>> > >>>>>>>>>>> > I've got a service that I have been testing quite a lot over >>>>>>>>>>> the last few days. Only after I handed it off for some testing to a >>>>>>>>>>> colleague, was he able to produce a SIGBUS panic that I had not >>>>>>>>>>> seen before: >>>>>>>>>>> > >>>>>>>>>>> > go 1.11.2 linux/amd64 >>>>>>>>>>> > >>>>>>>>>>> > The service does set up its own SIGINT/SIGTERM handling via >>>>>>>>>>> the typical siginal.Notify approach. The nature of the program is >>>>>>>>>>> that it >>>>>>>>>>> listens on nats.io message queues, and receives requests to run >>>>>>>>>>> tasks as sub-processes. My tests have been running between 40-200 >>>>>>>>>>> of these >>>>>>>>>>> instances over the course of a few days. But this panic occurred on >>>>>>>>>>> a >>>>>>>>>>> completely different machine that those I had been testing... >>>>>>>>>>> > >>>>>>>>>>> > goroutine 1121 [runnable (scan)]: >>>>>>>>>>> > fatal error: unexpected signal during runtime execution >>>>>>>>>>> > panic during panic >>>>>>>>>>> > [signal SIGBUS: bus error code=0x2 addr=0xfa2adc pc=0x451637] >>>>>>>>>>> > >>>>>>>>>>> > runtime stack: >>>>>>>>>>> > runtime.throw(0xcf7fe3, 0x2a) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/panic.go:608 +0x72 >>>>>>>>>>> > runtime.sigpanic() >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/signal_unix.go:374 >>>>>>>>>>> +0x2f2 >>>>>>>>>>> > runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, >>>>>>>>>>> 0x0, 0xc0004baa80, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/traceback.go:190 +0x377 >>>>>>>>>>> > runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, >>>>>>>>>>> 0x0, 0xc0004baa80, 0x0) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/traceback.go:728 +0xf3 >>>>>>>>>>> > runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, >>>>>>>>>>> 0xc0004baa80) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/traceback.go:682 +0x52 >>>>>>>>>>> > runtime.tracebackothers(0xc00012e780) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/traceback.go:947 +0x187 >>>>>>>>>>> > runtime.dopanic_m(0xc00012e780, 0x42dcc2, 0x7f83f6ffc808, 0x1) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/panic.go:805 +0x2aa >>>>>>>>>>> > runtime.fatalthrow.func1() >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/panic.go:663 +0x5f >>>>>>>>>>> > runtime.fatalthrow() >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/panic.go:660 +0x57 >>>>>>>>>>> > runtime.throw(0xcf7fe3, 0x2a) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/panic.go:608 +0x72 >>>>>>>>>>> > runtime.sigpanic() >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/signal_unix.go:374 >>>>>>>>>>> +0x2f2 >>>>>>>>>>> > runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, >>>>>>>>>>> 0x0, 0xc0004baa80, 0x0, 0x0, 0x7fffffff, 0x7f83f6ffcd00, 0x0, 0x0, >>>>>>>>>>> ...) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/traceback.go:190 +0x377 >>>>>>>>>>> > runtime.scanstack(0xc0004baa80, 0xc000031270) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/mgcmark.go:786 +0x15a >>>>>>>>>>> > runtime.scang(0xc0004baa80, 0xc000031270) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/proc.go:947 +0x218 >>>>>>>>>>> > runtime.markroot.func1() >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/mgcmark.go:264 +0x6d >>>>>>>>>>> > runtime.markroot(0xc000031270, 0xc000000047) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/mgcmark.go:245 +0x309 >>>>>>>>>>> > runtime.gcDrain(0xc000031270, 0x6) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/mgcmark.go:882 +0x117 >>>>>>>>>>> > runtime.gcBgMarkWorker.func2() >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/mgc.go:1858 +0x13f >>>>>>>>>>> > runtime.systemstack(0x7f83f7ffeb90) >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/asm_amd64.s:351 +0x66 >>>>>>>>>>> > runtime.mstart() >>>>>>>>>>> > /vol/apps/go/1.11.2/src/runtime/proc.go:1229 >>>>>>>>>>> > >>>>>>>>>>> > Much appreciated for any insight. >>>>>>>>>>> >>>>>>>>>>> Is the problem repeatable? >>>>>>>>>>> >>>>>>>>>>> It looks like it crashed while tracing back the stack during >>>>>>>>>>> garbage >>>>>>>>>>> collection, but I don't know why since the panic was evidently >>>>>>>>>>> able to >>>>>>>>>>> trace back the stack just fine. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for the reply. Unfortunately it was rare and never >>>>>>>>>> happened in my own testing of thousands of runs of this service. The >>>>>>>>>> colleague that saw this crash on one of his workstations was not >>>>>>>>>> able to >>>>>>>>>> repro it after attempting another run of the workflow. I wasn't >>>>>>>>>> really sure >>>>>>>>>> how to debug this particular crash since it was in the gc and I have >>>>>>>>>> seen a >>>>>>>>>> "panic during panic" before. Thought it might jump out at someone. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Oops. I meant that I *haven't* seen a "panic during panic" before >>>>>>>>> :-) >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Ian >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> This is a follow up to the issue of seeing a SIGBUS in my >>>>>>>> application. While I still don't have a way to reproduce the problem, >>>>>>>> I >>>>>>>> have received reports from my users of another similar SIGBUS: >>>>>>>> >>>>>>>> unexpected fault address 0x7fdf50 >>>>>>>> fatal error: fault >>>>>>>> [signal 0xb code=0x2 addr=0x7fdf50 pc=0x7fdf50] >>>>>>>> >>>>>>>> runtime.throw(0xad7840, 0x5) >>>>>>>> /s/go/1.12.1/src/runtime/panic.go:617 +0x72 fp=0xc000f75aa8 >>>>>>>> sp=0xc000f75a78 pc=0x444a5e >>>>>>>> runtime.sigpanic() >>>>>>>> /s/go/1.12.1/src/runtime/sigpanic_unix.go:387 +0x47e >>>>>>>> fp=0xc000f75ad8 sp=0xc000f75aa8 pc=0x444a5e >>>>>>>> >>>>>>>> project.com/project/obj.(*Server).newPushHandler.func1.1.1(0xc0008ea330, >>>>>>>> >>>>>>>> 0x25, 0x0) >>>>>>>> >>>>>>>> This is an anonymous inline function closure that was passed to a >>>>>>>> nats.io client topic subscription. If I am reading this correctly, >>>>>>>> it seems the address to the anonymous function is suddenly invalid? >>>>>>>> >>>>>>>> ie. >>>>>>>> >>>>>>>> go func() { >>>>>>>> ... >>>>>>>> someChan := make(chan bool, 1) >>>>>>>> natsConn.Subscribe(topic, func(_ string, typ Type) { >>>>>>>> ... >>>>>>>> someChan <- true >>>>>>>> }) >>>>>>>> ... >>>>>>>> }() >>>>>>>> >>>>>>>> Could I be triggering a bug based on this anonymous function >>>>>>>> closure in the goroutine? I can try defining things outside the >>>>>>>> goroutine, >>>>>>>> including the function. But honestly without this being a reliable >>>>>>>> crash I >>>>>>>> would be fumbling in the dark. >>>>>>>> >>>>>>>> Justin >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to a topic in >>>>>>> the Google Groups "golang-nuts" group. >>>>>>> To unsubscribe from this topic, visit >>>>>>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe >>>>>>> . >>>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>>> golan...@googlegroups.com. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "golang-nuts" group. >>>>> To unsubscribe from this topic, visit >>>>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe. >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> golan...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "golang-nuts" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> golan...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> > - Add to Phrasebook > - No word lists for English -> English... > - Create a new word list... > - Copy > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.