I have no experience with this code...but a lot of debugging experience. Since you think you know the general problem and have both hands inside the patient already, why not (for now) modify the GC scanner to always look at your batch’s pointer list and explicitly avoid messing with them.
Then you should be (a) able to validate your sense of the problem, (b) run with GC on, and (c) do diagnostic stack traces precisely from the “yes it is in the span of protected batch-related addresses” line and thereby see the call stack that presently leads to harm. Just a thought, maybe not a good one. On Mon, Apr 8, 2019 at 11:50 AM Tharen Abela <abela.tha...@gmail.com> wrote: > I am keeping an `allb` slice, and with that I did see it occasionally > succeed. > > I am using the binarytree > <https://gitlab.com/AbelThar/go.batch/blob/b10ef431c29b01fa7568a7bf9712a0286033266f/batching/src/runnables/binarytree.go> > test, since it is an issue regarding the GC. > In fact running it with GOGC=off, or also keep a slice with pointers in > the program, does consistently succeed, as well. > > What I do know is when I allocate a batch, I keep a raw pointer in the > slice, and is never popped or removed from there at any point. > P's will keep their own batch using a uintptr, and the others are stored > in either a global batch queue, or a queue of empty batches, the same as > gQueue but for the batch type *b*uintptr: all of which are irrelevant to > the GC. > > Now I modified the batch allocation to show me the pointer of `allb` and > the new batch allocated: > // Allocate a new batch > //go:nosplit > //go:yeswritebarrierrec > func allocb() *b { > // Break the cycle by doing acquirem/releasem around new(b). > // The acquirem/releasem increments m.locks during new(b), > // which keeps the garbage collector from being invoked. > mp := acquirem() > > var bp *b > > bp = new(b) > allb = append(allb, bp) > print("allb: ", allb, ", bp:", bp, "\n") > > releasem(mp) > return bp > } > > With GOGC=off I get that 6 batches have been created > GOGC=off GODEBUG=gccheckmark=1 gobatch run ./binarytree.go > allb: [1/1]0xc000010010, bp:0xc0000160f0 > allb: [2/2]0xc000012010, bp:0xc000016100 > allb: [3/4]0xc00000e020, bp:0xc000016110 > allb: [4/4]0xc00000e020, bp:0xc000016120 > allb: [5/8]0xc000062000, bp:0xc000060000 > allb: [6/8]0xc000062000, bp:0xc000060010 > > When it does succeed with the GC on, it consistently takes 13 batches, > which I find rather odd. > GODEBUG=gccheckmark=1 gobatch run ./binarytree.go > allb: [1/1]0xc000010010, bp:0xc0000160f0 > allb: [2/2]0xc000012010, bp:0xc000016100 > allb: [3/4]0xc00000e020, bp:0xc000016110 > allb: [4/4]0xc00000e020, bp:0xc000016120 > allb: [5/8]0xc000062000, bp:0xc000060000 > allb: [6/8]0xc000062000, bp:0xc000060010 > allb: [7/8]0xc000062000, bp:0xc000016150 > allb: [8/8]0xc000062000, bp:0xc0004b8000 > allb: [9/16]0xc000510000, bp:0xc00044a040 > allb: [10/16]0xc000510000, bp:0xc000514000 > allb: [11/16]0xc000510000, bp:0xc000540000 > allb: [12/16]0xc000510000, bp:0xc0004b8010 > allb: [13/16]0xc000510000, bp:0xc0004b8020 > > Now when it crashes it returns the following: (full stack trace on > pastebin) <https://pastebin.com/40iYNQrh> > GODEBUG=gccheckmark=1 gobatch run ./binarytree.go > allb: [1/1]0xc000010010, bp:0xc0000160f0 > allb: [2/2]0xc000012010, bp:0xc000016100 > allb: [3/4]0xc00000e020, bp:0xc000016110 > allb: [4/4]0xc00000e020, bp:0xc000016120 > allb: [5/8]0xc00006a000, bp:0xc000068000 > allb: [6/8]0xc00006a000, bp:0xc000068010 > allb: [7/8]0xc00006a000, bp:0xc000016170 > allb: [8/8]0xc00006a000, bp:0xc0004b4000 > allb: [9/16]0xc0004ea000, bp:0xc0004b4010 > allb: [10/16]0xc0004ea000, bp:0xc0004ee000 > allb: [11/16]0xc0004ea000, bp:0xc000448040 > runtime: marking free object 0xc000448040 found at *(0xc0004ea000+0x50) > base=0xc0004ea000 s.base()=0xc0004ea000 s.limit=0xc0004ec000 s.spanclass= > 18 s.elemsize=128 s.state=mSpanInUse > *(base+0) = 0xc0000160f0 > *(base+8) = 0xc000016100 > *(base+16) = 0xc000016110 > *(base+24) = 0xc000016120 > *(base+32) = 0xc000068000 > *(base+40) = 0xc000068010 > *(base+48) = 0xc000016170 > *(base+56) = 0xc0004b4000 > *(base+64) = 0xc0004b4010 > *(base+72) = 0xc0004ee000 > *(base+80) = 0xc000448040 <== > *(base+88) = 0x0 > *(base+96) = 0x0 > *(base+104) = 0x0 > *(base+112) = 0x0 > *(base+120) = 0x0 > obj=0xc000448040 s.base()=0xc000448000 s.limit=0xc00044a000 s.spanclass=5 > s.elemsize=16 s.state=mSpanInUse > *(obj+0) = 0x0 > *(obj+8) = 0xc0004ee000 > fatal error: marking free object > > At this point i'm assuming the error has been done, and the trace is just > when it was realized to be wrong. > > What I do notice is that its not always at the same level when the error > is noticed: > for the full stack trace, it was when the depth was 3, ... > goroutine 1 [runnable]:runtime.newobject(0x464820, 0x2) > /go.batch/src/runtime/malloc.go:1067 +0x51 fp=0xc000084b20 sp= > 0xc000084b18 pc=0x40a701 > main.bottomUpTree(0xffffffffffffbefb, 0x3, 0xc0008bb840) > /go.batch/batching/src/runnables/binarytree.go:33 +0x91 fp= > 0xc000084b60 sp=0xc000084b20 pc=0x44f011 > > ...and in another stack trace <https://pastebin.com/AukWCxAe>, it occured > when the depth was 1. > > I ran it some more times, and it always seemed to crash after batch 11, > and depth ranged from 0 to 3 > > The stack trace for depth 0 <https://pastebin.com/ZMG02MM3> of goroutine > 1 started more interesting, where it did not trigger at `newobject`. > goroutine 1 [GC assist marking]: > runtime.systemstack_switch() > /go.batch/src/runtime/asm_amd64.s:311 fp=0xc000086930 sp=0xc000086928 > pc=0x446d30 > runtime.gcAssistAlloc(0xc000000180) > /go.batch/src/runtime/mgcmark.go:422 +0x15c fp=0xc000086990 sp= > 0xc000086930 pc=0x416e5c > runtime.mallocgc(0x18, 0x464820, 0x1, 0x18) > /go.batch/src/runtime/malloc.go:843 +0x8e6 fp=0xc000086a30 sp= > 0xc000086990 pc=0x40a456 > runtime.newobject(0x464820, 0xc00045e000) > /go.batch/src/runtime/malloc.go:1068 +0x38 fp=0xc000086a60 sp= > 0xc000086a30 pc=0x40a6e8 > main.bottomUpTree(0xfffffffffffdec85, 0x0, 0x20) > /go.batch/batching/src/runnables/binarytree.go:29 +0xfc fp= > 0xc000086aa0 sp=0xc000086a60 pc=0x44f07c > > I think at this point I may be overthinking it a bit, and my lack of > experience is more apparent. > If there is something else I should be looking into, I am open to ideas. > > On Monday, 8 April 2019 19:42:56 UTC+2, Ian Lance Taylor wrote: > >> On Sun, Apr 7, 2019 at 12:30 PM Tharen Abela <abela...@gmail.com> wrote: >> > >> > The gist of the problem is that I am allocating an object in the >> runtime, (which I refer to as the batch struct), and the GC is deallocating >> the object, even though a reference is being kept in a slice (similar to >> allp and allm). >> > While allocating, I call acquirem to prevent the GC being triggered, >> during which I append the batch pointer to the slice. >> > >> > From running `GODEBUG=gccheckmark=1` I know that the batch object >> allocated, was being freed, yet when it crashes it says the object is being >> marked (hence marking a freed object). >> > >> > Now my intention is to keep the batch allocation till the end of the >> program, keeping it in an extra batch queue, so it should not be freed. >> > >> > Thinking about it now, I am not sure if the deallocation occurs after >> the work of the program is finished and is winding down, by de-allocating >> everything, but a reference is still kept in allb, so a double free will >> occur, OR, >> > what I have been assuming so far, that this takes place while work is >> incomplete so the GC is incorrectly de-allocating a batch object still in >> use. >> > >> > Another thing to take note of, is that the batch in P is referenced by >> a uintptr, I'm not sure how that might affect it. >> >> That is going to be your problem. The GC only tracks values with live >> pointers. A value of type `uintptr` can not be a live pointer. The >> runtime can only get away with the `guintptr`, `muintptr` and >> `puintptr` types because it knows that there are existing other >> pointers to all G and P values (in the allgs and allp slices and the >> allm linked list). If there is ever any moment that your batch >> objects are only referenced by `uintptr` values and not by a value of >> pointer type, then the garbage collector can collect it. >> >> Ian >> > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- *Michael T. jonesmichael.jo...@gmail.com <michael.jo...@gmail.com>* -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.