I was wrong about the gc not getting memory back from the goroutines. I 
think it does get that through the gcResetMarkState function.....
So I don't think the # of goroutines are the issue......I'm sorry if I 
misled you

On Tuesday, 30 April 2019 00:28:29 UTC-7, vaastav anand wrote:
>
> The stack trace only lists goroutines that are not dead/not system 
> goroutines/not the goroutine that is calling the traceback function. 
> (src/runtime/traceback.go)
> Additionally, I don't think go reclaims any memory from dead goroutines. 
> allgs struct in src/runtime/proc.go file in the go source code holds all 
> the goroutines that have been created during the lifetime of the program 
> and it is all heap allocated. I don't know if the garbage collector 
> reclaims any of these dead goroutines. If it doesn't, which I don't think 
> it does because nothing ever seems to be removed from allgs.
>
> On Monday, 29 April 2019 23:54:54 UTC-7, Justin Israel wrote:
>>
>>
>>
>> On Tue, Apr 30, 2019 at 6:33 PM vaastav anand <vaastav...@gmail.com> 
>> wrote:
>>
>>> I have encountered a SIGBUS with go before but I was hacking inside the 
>>> runtime and using shared mem with mmap.
>>>
>>> goroutines are assigned IDs incrementally and each goroutine at bare 
>>> minimum has 2.1KB stack space in go1.11 down from 2.7KB in go1.10 if I 
>>> recall correctly. So, at the very least at that point you could have easily 
>>> burnt through at least 7.5GB of memory. I am not sure what could happen if 
>>> you somehow exceed the amount of memory available. Seems like that is a 
>>> test you could write and see if launching more goroutines than that could 
>>> fit in the size of memory could actually cause a SIGBUS.
>>>
>>
>> The stack trace only listed 282 goroutines, which seems about right 
>> considering the number of clients that are connected. Its about 3 
>> goroutines per client connection, plus the other stuff in the server. I 
>> think it just indicates that I have turned over a lot of client connections 
>> over time. 
>>  
>>
>>>
>>> On Monday, 29 April 2019 23:25:52 UTC-7, Justin Israel wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Apr 30, 2019 at 6:09 PM vaastav anand <vaastav...@gmail.com> 
>>>> wrote:
>>>>
>>>>> Ok, so in the 2nd piece of code you posted, is some request being 
>>>>> pushed onto some OS queue? If so, is it possible that you may be maxing 
>>>>> the 
>>>>> queue out and then pushing something else into it and that could cause a 
>>>>> SIGBUS as well.... This seems super farfetched tho but it is hard to 
>>>>> debug 
>>>>> without really knowing what the application might really be doing.
>>>>>
>>>>
>>>> I want to say that I really appreciate you taking the time to try and 
>>>> give me some possible ideas, even though this is a really vague problem. I 
>>>> had only hoped someone had encountered something similar. 
>>>>
>>>> So that line in the SIGBUS crash is just trying to add a subscription 
>>>> to a message topic callback in the nats client connection:
>>>> https://godoc.org/github.com/nats-io/go-nats#Conn.Subscribe 
>>>> It's pretty high level logic at my application level. 
>>>>
>>>> One thing that stood out to me was that in the crash, the goroutine id 
>>>> number was 3538668. I had to double check to confirm that the go runtime 
>>>> just uses an insignificant increasing number. I guess it does indicate 
>>>> that 
>>>> the application turned over > 3 mil goroutines by that point. I'm 
>>>> wondering 
>>>> if this is caused by something in the gnatsd embedded server (
>>>> https://github.com/nats-io/gnatsd/tree/master/server) since most the 
>>>> goroutines do come from that, with all the client handling going on. If we 
>>>> are talking about something that is managing very large queues, that would 
>>>> be the one doing so in this application.
>>>>  
>>>>
>>>>>
>>>>> On Monday, 29 April 2019 22:57:40 UTC-7, Justin Israel wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 30, 2019 at 5:43 PM vaastav anand <vaastav...@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> I'd be very surprised if the anonymous goroutine is the reason 
>>>>>>> behind a SIGBUS violation.
>>>>>>> So, if I remember SIGBUS correctly, it means that you are issuing a 
>>>>>>> read/write to a memory address which is not really addressable or it is 
>>>>>>> misaligned. I think the chances of the address being misaligned are 
>>>>>>> very 
>>>>>>> low.....so it really has to be a non-existent address.
>>>>>>> It can happen if you have try to access memory outside the region 
>>>>>>> mmaped into your application.
>>>>>>> If your application has any kind of mmap or shared memory access, I 
>>>>>>> would start there.
>>>>>>> In any case your best bet is to somehow reproduce the bug 
>>>>>>> consistently and then figure out which memory access is causing the 
>>>>>>> fault.
>>>>>>>
>>>>>>
>>>>>> My application isn't doing anything with mmap or shared memory, and 
>>>>>> my direct and indirect dependencies don't seem to be anything like that 
>>>>>> either. Its limited to pretty much nats.io client, gnatds embedded 
>>>>>> server, and a thrift rpc. 
>>>>>>
>>>>>> It seems so random that I doubt I could get a reproducible crash. So 
>>>>>> I can really only try testing this on go 1.11 instead to see if any of 
>>>>>> the 
>>>>>> GC work in 1.12 causes this.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Monday, 29 April 2019 21:59:34 UTC-7, Justin Israel wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday, November 29, 2018 at 6:22:56 PM UTC+13, Justin Israel 
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 29, 2018 at 6:20 PM Justin Israel <justin...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Thu, Nov 29, 2018 at 5:32 PM Ian Lance Taylor <
>>>>>>>>>> ia...@golang.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 28, 2018 at 7:18 PM Justin Israel <
>>>>>>>>>>> justin...@gmail.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > I've got a service that I have been testing quite a lot over 
>>>>>>>>>>> the last few days. Only after I handed it off for some testing to a 
>>>>>>>>>>> colleague, was he able to produce a SIGBUS panic that I had not 
>>>>>>>>>>> seen before:
>>>>>>>>>>> >
>>>>>>>>>>> > go 1.11.2 linux/amd64
>>>>>>>>>>> >
>>>>>>>>>>> > The service does set up its own SIGINT/SIGTERM handling via 
>>>>>>>>>>> the typical siginal.Notify approach. The nature of the program is 
>>>>>>>>>>> that it 
>>>>>>>>>>> listens on nats.io message queues, and receives requests to run 
>>>>>>>>>>> tasks as sub-processes. My tests have been running between 40-200 
>>>>>>>>>>> of these 
>>>>>>>>>>> instances over the course of a few days. But this panic occurred on 
>>>>>>>>>>> a 
>>>>>>>>>>> completely different machine that those I had been testing...
>>>>>>>>>>> >
>>>>>>>>>>> > goroutine 1121 [runnable (scan)]:
>>>>>>>>>>> > fatal error: unexpected signal during runtime execution
>>>>>>>>>>> > panic during panic
>>>>>>>>>>> > [signal SIGBUS: bus error code=0x2 addr=0xfa2adc pc=0x451637]
>>>>>>>>>>> >
>>>>>>>>>>> > runtime stack:
>>>>>>>>>>> > runtime.throw(0xcf7fe3, 0x2a)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:608 +0x72
>>>>>>>>>>> > runtime.sigpanic()
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/signal_unix.go:374 
>>>>>>>>>>> +0x2f2
>>>>>>>>>>> > runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 
>>>>>>>>>>> 0x0, 0xc0004baa80, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:190 +0x377
>>>>>>>>>>> > runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 
>>>>>>>>>>> 0x0, 0xc0004baa80, 0x0)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:728 +0xf3
>>>>>>>>>>> > runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 
>>>>>>>>>>> 0xc0004baa80)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:682 +0x52
>>>>>>>>>>> > runtime.tracebackothers(0xc00012e780)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:947 +0x187
>>>>>>>>>>> > runtime.dopanic_m(0xc00012e780, 0x42dcc2, 0x7f83f6ffc808, 0x1)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:805 +0x2aa
>>>>>>>>>>> > runtime.fatalthrow.func1()
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:663 +0x5f
>>>>>>>>>>> > runtime.fatalthrow()
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:660 +0x57
>>>>>>>>>>> > runtime.throw(0xcf7fe3, 0x2a)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:608 +0x72
>>>>>>>>>>> > runtime.sigpanic()
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/signal_unix.go:374 
>>>>>>>>>>> +0x2f2
>>>>>>>>>>> > runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 
>>>>>>>>>>> 0x0, 0xc0004baa80, 0x0, 0x0, 0x7fffffff, 0x7f83f6ffcd00, 0x0, 0x0, 
>>>>>>>>>>> ...)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:190 +0x377
>>>>>>>>>>> > runtime.scanstack(0xc0004baa80, 0xc000031270)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:786 +0x15a
>>>>>>>>>>> > runtime.scang(0xc0004baa80, 0xc000031270)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/proc.go:947 +0x218
>>>>>>>>>>> > runtime.markroot.func1()
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:264 +0x6d
>>>>>>>>>>> > runtime.markroot(0xc000031270, 0xc000000047)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:245 +0x309
>>>>>>>>>>> > runtime.gcDrain(0xc000031270, 0x6)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:882 +0x117
>>>>>>>>>>> > runtime.gcBgMarkWorker.func2()
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgc.go:1858 +0x13f
>>>>>>>>>>> > runtime.systemstack(0x7f83f7ffeb90)
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/asm_amd64.s:351 +0x66
>>>>>>>>>>> > runtime.mstart()
>>>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/proc.go:1229
>>>>>>>>>>> >
>>>>>>>>>>> > Much appreciated for any insight.
>>>>>>>>>>>
>>>>>>>>>>> Is the problem repeatable?
>>>>>>>>>>>
>>>>>>>>>>> It looks like it crashed while tracing back the stack during 
>>>>>>>>>>> garbage
>>>>>>>>>>> collection, but I don't know why since the panic was evidently 
>>>>>>>>>>> able to
>>>>>>>>>>> trace back the stack just fine.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks for the reply. Unfortunately it was rare and never 
>>>>>>>>>> happened in my own testing of thousands of runs of this service. The 
>>>>>>>>>> colleague that saw this crash on one of his workstations was not 
>>>>>>>>>> able to 
>>>>>>>>>> repro it after attempting another run of the workflow. I wasn't 
>>>>>>>>>> really sure 
>>>>>>>>>> how to debug this particular crash since it was in the gc and I have 
>>>>>>>>>> seen a 
>>>>>>>>>> "panic during panic" before. Thought it might jump out at someone.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oops. I meant that I *haven't* seen a "panic during panic" before 
>>>>>>>>> :-) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Ian
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> This is a follow up to the issue of seeing a SIGBUS in my 
>>>>>>>> application. While I still don't have a way to reproduce the problem, 
>>>>>>>> I 
>>>>>>>> have received reports from my users of another similar SIGBUS:
>>>>>>>>
>>>>>>>> unexpected fault address 0x7fdf50
>>>>>>>> fatal error: fault
>>>>>>>> [signal 0xb code=0x2 addr=0x7fdf50 pc=0x7fdf50]
>>>>>>>>
>>>>>>>> runtime.throw(0xad7840, 0x5)
>>>>>>>>         /s/go/1.12.1/src/runtime/panic.go:617 +0x72 fp=0xc000f75aa8 
>>>>>>>> sp=0xc000f75a78 pc=0x444a5e 
>>>>>>>> runtime.sigpanic()
>>>>>>>>         /s/go/1.12.1/src/runtime/sigpanic_unix.go:387 +0x47e 
>>>>>>>> fp=0xc000f75ad8 sp=0xc000f75aa8 pc=0x444a5e
>>>>>>>>
>>>>>>>> project.com/project/obj.(*Server).newPushHandler.func1.1.1(0xc0008ea330,
>>>>>>>>  
>>>>>>>> 0x25, 0x0)
>>>>>>>>
>>>>>>>> This is an anonymous inline function closure that was passed to a 
>>>>>>>> nats.io client topic subscription. If I am reading this correctly, 
>>>>>>>> it seems the address to the anonymous function is suddenly invalid?
>>>>>>>>
>>>>>>>> ie.
>>>>>>>>
>>>>>>>> go func() {
>>>>>>>>     ...
>>>>>>>>     someChan := make(chan bool, 1)
>>>>>>>>     natsConn.Subscribe(topic, func(_ string, typ Type) {
>>>>>>>>         ...
>>>>>>>>         someChan <- true
>>>>>>>>     })
>>>>>>>>     ...
>>>>>>>> }()
>>>>>>>>
>>>>>>>> Could I be triggering a bug based on this anonymous function 
>>>>>>>> closure in the goroutine? I can try defining things outside the 
>>>>>>>> goroutine, 
>>>>>>>> including the function. But honestly without this being a reliable 
>>>>>>>> crash I 
>>>>>>>> would be fumbling in the dark. 
>>>>>>>>
>>>>>>>> Justin 
>>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to a topic in 
>>>>>>> the Google Groups "golang-nuts" group.
>>>>>>> To unsubscribe from this topic, visit 
>>>>>>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe
>>>>>>> .
>>>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>>>> golan...@googlegroups.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "golang-nuts" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> golan...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "golang-nuts" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> golan...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>    - Add to Phrasebook
>    - No word lists for English -> English...
>       - Create a new word list...
>    - Copy
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to