That's right Kurtis, we don't wait or check if the prior go routine which
fired the big query has completed or not.
The rationale there was the data which comes out from big query is not much
and should not take more than 2 mins.

To be precise every 2 mins we will fire 6 independent Big queries & 24
independent elastic queries each in its own go routine.
The data coming out is cached locally and when the user requests that data
we serve it out of cache.

What we are unable to understand is why would this cause http server to
stop responding to user requests which are not coming for this data.
Regular requests were also not getting served. Since the system was
responding we had to send sig abort and collect the back trace.


goroutine 3332366 [select, *79 minutes*]:
net/http.(*http2ClientConn).*roundTrip*(0xc003759b00, 0xc00050b700, 0x0,
0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/h2_bundle.go:7573 +0x9a0
net/http.(*http2Transport).*RoundTripOpt*(0xc0000a0480, 0xc00050b700,
0xc0037a8600, 0x74aba5, 0x0, 0xc0001c8000)
        /usr/local/go/src/net/http/h2_bundle.go:6936 +0x172
net/http.(*http2Transport).*RoundTrip*(...)
        /usr/local/go/src/net/http/h2_bundle.go:6897
net/http.http2noDialH2RoundTripper.*RoundTrip*(0xc0000a0480, 0xc00050b700,
0xc0047e97a0, 0x5, 0xc0001c8088)
        /usr/local/go/src/net/http/h2_bundle.go:9032 +0x3e
net/http.(*Transport).*roundTrip*(0x5b48c00, 0xc00050b700, 0x2b3ce20,
0x5b48801, 0xc0048deb00)
        /usr/local/go/src/net/http/transport.go:485 +0xdc8
net/http.(*Transport).*RoundTrip*(0x5b48c00, 0xc00050b700, 0x3301658, 0xa,
0xc0048debd0)
        /usr/local/go/src/net/http/roundtrip.go:17 +0x35
google.golang.org/api/transport/http.(*parameterTransport).*RoundTrip*(0xc0028d9e40,
0xc00050b600, 0x5, 0x5, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/transport/http/dial.go:145 +0x224
go.opencensus.io/plugin/ochttp.(*traceTransport).*RoundTrip*(0xc00284d6c0,
0xc00050b600, 0xc002080980, 0x1, 0x1)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/trace.go:84
+0x456
go.opencensus.io/plugin/ochttp.statsTransport.*RoundTrip*(0x38d4e60,
0xc00284d6c0, 0xc00050b400, 0xc0025e27e0, 0xc0025e27e0, 0x2ecb720)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client_stats.go:57
+0x5cc
go.opencensus.io/plugin/ochttp.(*Transport).*RoundTrip*(0xc003731d60,
0xc00050b400, 0x0, 0x0, 0x150)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client.go:99
+0x20e
golang.org/x/oauth2.(*Transport).*RoundTrip*(0xc00376bc20, 0xc00050b300,
0x0, 0x0, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
golang.org/x/oauth2@v0.0.0-20200107190931-bf48bf16ab8d/transport.go:55 +0xf4
net/http.send(0xc00050b300, 0x38d5060, 0xc00376bc20, 0x0, 0x0, 0x0,
0xc0036a64c8, 0x1d6cede, 0x1, 0x0)
        /usr/local/go/src/net/http/client.go:250 +0x443
net/http.(*Client).send(0xc003afdb60, 0xc00050b300, 0x0, 0x0, 0x0,
0xc0036a64c8, 0x0, 0x1, 0x0)
        /usr/local/go/src/net/http/client.go:174 +0xfa
net/http.(*Client).do(0xc003afdb60, 0xc00050b300, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:641 +0x3ce
net/http.(*Client).Do(...)
        /usr/local/go/src/net/http/client.go:509
google.golang.org/api/internal/gensupport.send(0x393ede0, 0xc00346b8c0,
0xc003afdb60, 0xc00050b200, 0x5b7fee0, 0xc0037a9300, 0x1d6f876)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/internal/gensupport/send.go:67 +0xef
google.golang.org/api/internal/gensupport.SendRequest(0x393ede0,
0xc00346b8c0, 0xc003afdb60, 0xc00050b200, 0xc0037a95e8, 0x84, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/internal/gensupport/send.go:52 +0x193
google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).doRequest(0xc0037a98f8,
0x32f6d9c, 0x4, 0x0, 0x0, 0x203001)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/bigquery/v2/bigquery-gen.go:7768 +0x765
google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).Do(0xc0037a98f8,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/bigquery/v2/bigquery-gen.go:7780 +0xa8
cloud.google.com/go/bigquery.(*Job).waitForQuery.func1(0x6d183a,
0xc0037a97c0, 0xc0037a97e0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/bigquery@v1.4.0/job.go:320 +0x48
cloud.google.com/go/internal.retry(0x393ede0, 0xc00346b8c0, 0x3b9aca00,
0xdf8475800, 0x4000000000000000, 0x0, 0xc0037a98e0, 0x3433818, 0x0,
0x203000)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go@v0.53.0/internal/retry.go:38 +0x48
cloud.google.com/go/internal.Retry(...)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go@v0.53.0/internal/retry.go:31
cloud.google.com/go/bigquery.(*Job).waitForQuery(0xc004080660, 0x393ede0,
0xc00346b8c0, 0xc001b74ca0, 0x7, 0x393ed60, 0xc0000d8018, 0x2d62240,
0x5b7e118, 0x3242840, ...)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/*bigquery*@v1.4.0/job.go:319 +0x4df
cloud.google.com/go/bigquery.(*Job).read(0xc004080660, 0x393ede0,
0xc00346b8c0, 0xc0037a9b38, 0x3430808, 0x0, 0x0, 0x393ede0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/*bigquery*@v1.4.0/job.go:293 +0xbd
cloud.google.com/go/bigquery.(*Job).Read(0xc004080660, 0x393ed60,
0xc0000d8018, 0x0, 0x0, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/*bigquery*@v1.4.0/job.go:280 +0x14c
cloud.google.com/go/bigquery.(*Query).Read(0xc003615ca0, 0x393ed60,
0xc0000d8018, 0xc003743a00, 0x1, 0x1)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/*bigquery*@v1.4.0/query.go:333 +0x91
git.fusion.io/fusionio/fusion/utils.git/bigquery.(*BigQueryWrapper).GetTemplateSQL(0xc003afdbc0,
0x331199e, 0x11, 0xc003afda70, 0x393f0e0, 0xc00010e1c0, 0xc003afdbc0, 0x0,
0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
git.fusion.io/fusionio/fusion/utils.git@v0.0.0-20200813012118-9675cdb4e965/bigquery/bigquery.go:105
+0x25d
git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).executeQuery(0xc000010418,
0xc0028c3000, 0x331199e, 0x11, 0x24, 0x0, 0x0, 0x0)
        /builds/fusionio/fusion/controller/stats/statsdao.go:401 +0xe3
git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).GetTeamAppExpScore(..
.)
        /builds/fusionio/fusion/controller/stats/statsdao.go:545
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetTeamAppExpScore(0xc001b54d40,
0xc0028c3000, 0x0, 0x42fd7a, 0x0, 0x0)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:459 +0x4e
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runQuery(0xc001b54d40,
0x330e1b0, 0xf, 0xc003da9870, 0xc0024ef080, 0xc001c4fcf0)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:464 +0x7e
created by
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:485 +0x1a4

goroutine 3332154 [select, *79 minutes*]:
net/http.(*http2ClientConn).*roundTrip*(0xc003759b00, 0xc00050a500, 0x0,
0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/h2_bundle.go:7573 +0x9a0
net/http.(*http2Transport).*RoundTripOpt*(0xc0000a0480, 0xc00050a500,
0xc002de4600, 0x74aba5, 0x0, 0xc0001c8000)
        /usr/local/go/src/net/http/h2_bundle.go:6936 +0x172
net/http.(*http2Transport).*RoundTrip*(...)
        /usr/local/go/src/net/http/h2_bundle.go:6897
net/http.http2noDialH2RoundTripper.*RoundTrip*(0xc0000a0480, 0xc00050a500,
0xc0047e95f0, 0x5, 0xc0001c8088)
        /usr/local/go/src/net/http/h2_bundle.go:9032 +0x3e
net/http.(*Transport).*roundTrip*(0x5b48c00, 0xc00050a500, 0x2b3ce20,
0x5b48801, 0xc00170db80)
        /usr/local/go/src/net/http/transport.go:485 +0xdc8
net/http.(*Transport).*RoundTrip*(0x5b48c00, 0xc00050a500, 0x3301658, 0xa,
0xc00170dc20)
        /usr/local/go/src/net/http/roundtrip.go:17 +0x35
google.golang.org/api/transport/http.(*parameterTransport).*RoundTrip*(0xc0020a7e00,
0xc00050a400, 0x5, 0x5, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/transport/http/dial.go:145 +0x224
go.opencensus.io/plugin/ochttp.(*traceTransport).*RoundTrip*(0xc00284c540,
0xc00050a400, 0xc000eb78a0, 0x1, 0x1)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/trace.go:84
+0x456
go.opencensus.io/plugin/ochttp.statsTransport.*RoundTrip*(0x38d4e60,
0xc00284c540, 0xc00050a200, 0xc00420b910, 0xc00420b910, 0x2ecb720)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client_stats.go:57
+0x5cc
go.opencensus.io/plugin/ochttp.(*Transport).*RoundTrip*(0xc00400d360,
0xc00050a200, 0x0, 0x0, 0x150)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client.go:99
+0x20e
golang.org/x/oauth2.(*Transport).RoundTrip(0xc0022aaee0, 0xc00050a100, 0x0,
0x0, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
golang.org/x/oauth2@v0.0.0-20200107190931-bf48bf16ab8d/transport.go:55 +0xf4
net/http.send(0xc00050a100, 0x38d5060, 0xc0022aaee0, 0x0, 0x0, 0x0,
0xc0036a6310, 0x1d6cede, 0x1, 0x0)
        /usr/local/go/src/net/http/client.go:250 +0x443
net/http.(*Client).send(0xc003b8e540, 0xc00050a100, 0x0, 0x0, 0x0,
0xc0036a6310, 0x0, 0x1, 0x0)
        /usr/local/go/src/net/http/client.go:174 +0xfa
net/http.(*Client).do(0xc003b8e540, 0xc00050a100, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:641 +0x3ce
net/http.(*Client).Do(...)
        /usr/local/go/src/net/http/client.go:509
google.golang.org/api/internal/gensupport.send(0x393ede0, 0xc0031bb710,
0xc003b8e540, 0xc00050a000, 0x5b7fee0, 0xc002de5300, 0x1d6f876)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/internal/gensupport/send.go:67 +0xef
google.golang.org/api/internal/gensupport.SendRequest(0x393ede0,
0xc0031bb710, 0xc003b8e540, 0xc00050a000, 0xc002de55e8, 0x84, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/internal/gensupport/send.go:52 +0x193
google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).doRequest(0xc002de58f8,
0x32f6d9c, 0x4, 0x0, 0x0, 0x203000)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/*bigquery*/v2/bigquery-gen.go:7768 +0x765
google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).Do(0xc002de58f8,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/*bigquery*/v2/bigquery-gen.go:7780 +0xa8
cloud.google.com/go/bigquery.(*Job).waitForQuery.func1(0x6d183a,
0xc002de57c0, 0xc002de57e0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/*bigquery*@v1.4.0/job.go:320 +0x48
cloud.google.com/go/internal.retry(0x393ede0, 0xc0031bb710, 0x3b9aca00,
0xdf8475800, 0x4000000000000000, 0x0, 0xc002de58e0, 0x3433818, 0x0,
0x203000)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go@v0.53.0/internal/retry.go:38 +0x48
cloud.google.com/go/internal.Retry(...)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go@v0.53.0/internal/retry.go:31
cloud.google.com/go/bigquery.(*Job).waitForQuery(0xc003491980, 0x393ede0,
0xc0031bb710, 0xc00233d700, 0x7, 0x393ed60, 0xc0000d8018, 0x2d62240,
0x5b7e118, 0x3242840, ...)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/bigquery@v1.4.0/job.go:319 +0x4df
cloud.google.com/go/bigquery.(*Job).read(0xc003491980, 0x393ede0,
0xc0031bb710, 0xc002de5b38, 0x3430808, 0x0, 0x0, 0x393ede0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/bigquery@v1.4.0/job.go:293 +0xbd
cloud.google.com/go/bigquery.(*Job).Read(0xc003491980, 0x393ed60,
0xc0000d8018, 0x0, 0x0, 0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/bigquery@v1.4.0/job.go:280 +0x14c
cloud.google.com/go/bigquery.(*Query).Read(0xc003c79ca0, 0x393ed60,
0xc0000d8018, 0xc001e36200, 0x1, 0x1)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/bigquery@v1.4.0/query.go:333 +0x91
git.fusion.io/fusionio/fusion/utils.git/bigquery.(*BigQueryWrapper).GetTemplateSQL(0xc003b8e5a0,
0x331199e, 0x11, 0xc003b8e450, 0x393f0e0, 0xc00010e1c0, 0xc003b8e5a0, 0x0,
0x0)
        /builds/fusionio/fusion/controller/gopath/pkg/mod/
git.fusion.io/fusionio/fusion/utils.git@v0.0.0-20200813012118-9675cdb4e965/bigquery/bigquery.go:105
+0x25d
git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).executeQuery(0xc000010418,
0xc0020cb600, 0x331199e, 0x11, 0xc000eb0000, 0xbd, 0x400, 0x88)
        /builds/fusionio/fusion/controller/stats/statsdao.go:401 +0xe3
git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).GetTeamAppExpScore(..
.)
        /builds/fusionio/fusion/controller/stats/statsdao.go:545
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetTeamAppExpScore(0xc001b54d40,
0xc0020cb600, 0x44ea57, 0x42fd7a, 0xc001470558, 0x101)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:459 +0x4e
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runQuery(0xc001b54d40,
0x330e1b0, 0xf, 0xc0030d2a80, 0xc0035995c0, 0xc0015ede40)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:464 +0x7e
created by
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:485 +0x1a4

Corresponding go routines waiting on wait groups.

goroutine 3332149 [semacquire, *79 minutes*]:
sync.runtime_Semacquire(0xc0015ede48)
        /usr/local/go/src/runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc0015ede40)
        /usr/local/go/src/sync/waitgroup.go:130 +0x64
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).*GetMainUi*(0xc001b54d40,
0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).
*fetchMainUiTeamInterval*(0xc001b56780, 0xc002356ba0, 0x24, 0x32f7b78, 0x5)
        /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
created by
git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
        /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8

goroutine 3332131 [semacquire, *79 minutes*]:
sync.runtime_Semacquire(0xc001c4fcf8)
        /usr/local/go/src/runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc001c4fcf0)
        /usr/local/go/src/sync/waitgroup.go:130 +0x64
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).*GetMainUi*(0xc001b54d40,
0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
        /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).
*fetchMainUiTeamInterval*(0xc001b56780, 0xc002356810, 0x24, 0x32f7b78, 0x5)
        /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
created by
git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
        /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8




On Wed, Aug 26, 2020 at 9:28 PM Kurtis Rader <kra...@skepticism.us> wrote:

> On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
>> Right, then it looks less likely that we are blocked on a mutex.
>>
>> Every 2 minutes we spin up a go routine which then in turn spins up a
>> bunch of go routines to collect data from big query & elastic (data
>> collector routines).
>> The 2 minutes go routine then in turn waits on waitgroup for data
>> collector routines to come back. So its not single go routine from our side
>> at least.
>> From backtrace we have from sigabort we see only one such data collector
>> go routine blocked and other 2 2 mins thread waiting on waitgroup.
>>
>
> Are you spinning up a Go routine every 2 minutes regardless of whether the
> previous instance had completed? Your comment is not clear whether you have
> one, or more than one, goroutine issuing a BigQuery query at the time you
> saw the problem and captured the backtraces. If you are, in fact, starting
> those Go routines without regard to whether the prior instance had
> completed that seems like a "yellow flag", "here be dragons", situation.
>
> --
> Kurtis Rader
> Caretaker of the exceptional canines Junior and Hank
>


-- 
-Siddhesh.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2Bhd%2Bu7XeG%3Ddg9Pe73S%2BOdQ9rRu68HGDDG3xp7a4Jrh8HA%40mail.gmail.com.

Reply via email to