> did manage to catch the processing running away again, this time at 300%, and I got some output with gctrace=1 as you suggested. I'm not sure how to read the lines though; could you advise Dave?
The give away is the frequency of the gc lines. gc 15 (the 15th gc event) happened at 1314 seconds into your programs execution this tells me that gc is likely not your problem. If it were your terminal would be swamped by gc log lines indicating the gc was running constantly. On Tuesday, 7 February 2017 16:42:28 UTC+11, Jason E. Aten wrote: > > the race is fixed in the latest push; > a572f570c52f70c9518bc1b3e3319ff9e2424885; it was an artifact of adding > logging levels, and would have affected only the tests. > > I did manage to catch the processing running away again, this time at > 300%, and I got some output with gctrace=1 as you suggested. I'm not sure > how to read the lines though; could you advise Dave? > > > runaway cpu behavior happening from at least: > Tue Feb 7 05:26:24 UTC 2017 - 2017/02/07 05:37:05.587710 > > [jaten@host]$ GODEBUG=gctrace=1 go/bin/vhaline.go1.8rc3 -info -addr > localhost:9001 -parent localhost:9000 -cpu > 2017/02/07 05:14:07 mid node 'a7fb658c' started on localhost:9001. my > parent is 'localhost:9000' > 2017/02/07 05:14:07 mid node 'a7fb658c' using ttl=3s and beat=1s. > 2017/02/07 05:14:07.136012 client.go:85: [pid 86063] replica (a7fb658c) > client dialing parent at 'localhost:9000' > 2017/02/07 05:14:07.136266 client.go:97: [pid 86063] replica (a7fb658c) > client reached parent at 'localhost:9000' > gc 1 @0.007s 1%: 0.18+0.39+0.093 ms clock, 0.36+0.23/0.56/0.23+0.18 ms > cpu, 17->17->16 MB, 18 MB goal, 16 P > gc 2 @0.008s 2%: 0.063+0.43+0.078 ms clock, 0.44+0.033/0.69/0.28+0.55 ms > cpu, 32->32->32 MB, 33 MB goal, 16 P > 2017/02/07 05:14:07.139570 replica.go:511: [pid 86063] replica (a7fb658c) > sees from parent: ToChildConnectAck. > 2017/02/07 05:14:07 pprof profiler providing web server on 'localhost:6061' > 2017/02/07 05:14:07 cmd/vhaline/main.go: contacted parent event. > 2017/02/07 05:14:09.139731 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(/). > 2017/02/07 05:14:09.139759 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000176418s/never > gc 3 @52.176s 0%: 0.065+0.24+0.082 ms clock, 0.58+0.15/0.62/0.47+0.74 ms > cpu, 49->49->48 MB, 65 MB goal, 16 P > gc 4 @52.177s 0%: 0.041+0.55+0.094 ms clock, 0.502017/02/07 > 05:14:59.307973 replica.go:742: [pid 86063] replica (a7fb658c) > NewChildConnect from child 4040ba06' at '127.0.0.1:46194' > +0.054/0.54/0.40+1.1 ms cpu, 64->64->64 MB, 97 MB goal, 16 P > 2017/02/07 05:14:59 cmd/vhaline/main.go: contacted child event. > 2017/02/07 05:15:16.160638 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(4040ba06/127.0.0.1:46194). > 2017/02/07 05:15:16.160673 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000166804s/1.000144038s > 2017/02/07 05:16:24.182488 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(4040ba06/127.0.0.1:46194). > 2017/02/07 05:16:24.182518 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000148515s/1.000133515s > scvg0: inuse: 68, idle: 0, sys: 69, released: 0, consumed: 69 (MB) > GC forced > gc 5 @172.183s 0%: 0.089+0.30+0.086 ms clock, 1.2+0/0.74/1.0+1.2 ms cpu, > 67->67->64 MB, 129 MB goal, 16 P > 2017/02/07 05:17:24.201369 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(4040ba06/127.0.0.1:46194). > 2017/02/07 05:17:24.201404 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000158566s/1.000172823s > 2017/02/07 05:18:29.222380 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(4040ba06/127.0.0.1:46194). > 2017/02/07 05:18:29.222413 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000154109s/1.000142059s > GC forced > gc 6 @292.191s 0%: 0.18+0.31+0.10 ms clock, 2.7+0/0.72/0.89+1.6 ms cpu, > 67->67->64 MB, 129 MB goal, 16 P > scvg1: inuse: 67, idle: 2, sys: 69, released: 0, consumed: 69 (MB) > 2017/02/07 05:19:35.243248 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(4040ba06/127.0.0.1:46194). > 2017/02/07 05:19:35.243274 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000142593s/1.000160351s > 2017/02/07 05:20:45.265426 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(4040ba06/127.0.0.1:46194). > 2017/02/07 05:20:45.265455 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000176146s/1.000179458s > GC forced > gc 7 @412.200s 0%: 0.027+0.28+0.097 ms clock, 0.43+0/0.78/0.69+1.5 ms cpu, > 67->67->64 MB, 129 MB goal, 16 P > scvg2: inuse: 67, idle: 1, sys: 69, released: 0, consumed: 69 (MB) > 2017/02/07 05:21:54.287241 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(4040ba06/127.0.0.1:46194). > 2017/02/07 05:21:54.287275 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000167178s/2.000483842s > 2017/02/07 05:21:54.287396 replica.go:278: [pid 86063] replica (a7fb658c) > child connection lost > 2017/02/07 05:21:55.287620 replica.go:613: [pid 86063] replica (a7fb658c) > it's been 3.000864314s (> ttl == 3s) since last child contact, declaring > child '4040ba06' at '127.0.0.1:46194' to have failed. > 2017/02/07 05:21:55 cmd/vhaline/main.go:child failed event. > 2017/02/07 05:21:55.821703 replica.go:742: [pid 86063] replica (a7fb658c) > NewChildConnect from child 96bfad81' at '127.0.0.1:46534' > 2017/02/07 05:22:01.289558 replica.go:278: [pid 86063] replica (a7fb658c) > child connection lost > 2017/02/07 05:22:01.865167 replica.go:374: [pid 86063] replica (a7fb658c) > rejecting new child 'b305026f/127.0.0.1:46542' b/c already have '96bfad81/ > 127.0.0.1:46534' > gc 8 @474.734s 0%: 0.031+0.51+0.11 ms clock, 0.50+0.093/0.99/0.020+1.9 ms > cpu, 130->130->128 MB, 131 MB goal, 16 P > 2017/02/07 05:22:03.290073 replica.go:613: [pid 86063] replica (a7fb658c) > it's been 4.001142473s (> ttl == 3s) since last child contact, declaring > child '96bfad81' at '127.0.0.1:46534' to have failed. > 2017/02/07 05:22:03 cmd/vhaline/main.go:child failed event. > 2017/02/07 05:22:05.038889 replica.go:742: [pid 86063] replica (a7fb658c) > NewChildConnect from child 753dd7fe' at '127.0.0.1:46550' > 2017/02/07 05:23:04.309740 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:23:04.309767 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000190244s/1.000179191s > GC forced > gc 9 @594.743s 0%: 0.025+0.48+0.12 ms clock, 0.40+0/1.0/0.29+2.0 ms cpu, > 163->163->161 MB, 257 MB goal, 16 P > scvg3: inuse: 163, idle: 2, sys: 165, released: 0, consumed: 165 (MB) > 2017/02/07 05:24:12.331876 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:24:12.331911 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000161445s/1.000345842s > 2017/02/07 05:25:16.352877 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:25:16.352912 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000113785s/1.000117716s > GC forced > gc 10 @714.754s 0%: 0.019+0.34+0.12 ms clock, 0.31+0/1.0/0.52+2.0 ms cpu, > 164->164->161 MB, 322 MB goal, 16 P > 2017/02/07 05:26:20.373956 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:26:20.373982 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000188576s/1.000199715s > scvg4: inuse: 164, idle: 2, sys: 166, released: 0, consumed: 166 (MB) > > ### observing the 300% cpu spike from here on... > > 2017/02/07 05:27:26.395974 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:27:26.396009 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000216294s/1.000187739s > GC forced > gc 11 @834.754s 0%: 0.047+0.35+0.11 ms clock, 0.76+0/1.0/0.55+1.8 ms cpu, > 164->164->161 MB, 323 MB goal, 16 P > 2017/02/07 05:28:36.418341 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:28:36.418369 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000121649s/1.000160242s > scvg5: inuse: 164, idle: 1, sys: 166, released: 0, consumed: 166 (MB) > 2017/02/07 05:29:38.438866 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:29:38.438893 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000151323s/1.000168023s > GC forced > gc 12 @954.763s 0%: 0.025+0.36+0.11 ms clock, 0.40+0/1.0/0.54+1.8 ms cpu, > 166->166->162 MB, 323 MB goal, 16 P > 2017/02/07 05:30:42.459870 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:30:42.459896 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000071789s/1.000163893s > scvg6: inuse: 166, idle: 1, sys: 168, released: 0, consumed: 168 (MB) > 2017/02/07 05:31:46.480874 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:31:46.480901 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000216787s/1.000181407s > GC forced > gc 13 @1074.772s 0%: 0.034+0.39+0.12 ms clock, 0.54+0/1.0/0.37+2.0 ms cpu, > 165->165->162 MB, 325 MB goal, 16 P > 2017/02/07 05:32:52.502406 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:32:52.502432 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000112204s/1.00011439s > 2017/02/07 05:33:54.523284 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:33:54.523312 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000156022s/1.000152209s > GC forced > gc 14 @1194.775s 0%: 0.024+0.32+0.12 ms clock, 0.39+0/0.98/0.50+1.9 ms > cpu, 169->169->164 MB, 325 MB goal, 16 P > scvg7: inuse: 167, idle: 4, sys: 171, released: 0, consumed: 171 (MB) > 2017/02/07 05:34:57.544262 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:34:57.544288 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000198885s/1.000148081s > GC forced > gc 15 @1314.779s 0%: 0.031+0.29+0.11 ms clock, 0.49+0/0.99/0.47+1.7 ms > cpu, 167->167->164 MB, 329 MB goal, 16 P > 2017/02/07 05:36:02.566141 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:36:02.566168 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.00034139s/1.000175237s > scvg8: 0 MB released > scvg8: inuse: 167, idle: 4, sys: 171, released: 0, consumed: 171 (MB) > 2017/02/07 05:37:05.587678 replica.go:587: [pid 86063] replica (a7fb658c) > line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) > -> child(753dd7fe/127.0.0.1:46550). > 2017/02/07 05:37:05.587710 replica.go:598: [pid 86063] replica (a7fb658c) > last-ping (parent/child): 1.000171768s/1.000188289s > > ### the 300% cpu peg is still in progress at this point. > > none of the go1.7.4 processes are showing the cpu peg. > > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
