Robert,

That's where the confusion is.
>From the backtrace we see that two go routines are blocked on the waitgroup.
The go routine on which they are blocked is waiting on big query to return.

On every user request we create a new go routine so they shouldn't get
blocked because of these two.

However, we do see one go routine blocked on futex. Is there a way to know
from where it originated from ?
When does golang use futex under the hold so that we can investigate from
that angle.

goroutine 0 [idle]:
runtime.futex(0x5b607a8, 0x80, 0x0, 0x0, 0x0, 0x7ffc5dd1bd20, 0x437640,
0xc003638038, 0x7ffc5dd1bd48, 0x40c78f, ...)
        /usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
runtime.futexsleep(0x5b607a8, 0x0, 0xffffffffffffffff)
        /usr/local/go/src/runtime/os_linux.go:44 +0x46
runtime.notesleep(0x5b607a8)
        /usr/local/go/src/runtime/lock_futex.go:151 +0x9f
runtime.stopm()
        /usr/local/go/src/runtime/proc.go:1928 +0xc0
runtime.exitsyscall0(0xc003638000)
        /usr/local/go/src/runtime/proc.go:3140 +0x111
runtime.mcall(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b

On Wed, Aug 26, 2020 at 5:02 PM robert engels <reng...@ix.netcom.com> wrote:

> That should allow your server to clean up “dead” clients. Typically you
> use this in conjunction with a ‘keep alive’ in the protocol.
>
> I am doubtful that a bunch of dead clients hanging around would cause a
> CPU spike. You really don’t have too many Go routines/connections involved
> (I’ve worked with 1000’s of live connections).
>
> I would look at something else… I am guessing your synchronization is
> incorrect and your threads are blocking, and you have a few that are
> spinning expecting something to happen that never will (because the source
> of the event is blocked on the mutex/lock).
>
>
>
> On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar <siddhesh.dive...@gmail.com>
> wrote:
>
> Robert,
>
> I assume we can safely add these timeouts based on what we expect
> should be a reasonable timeout for our clients ?
>
> s.ReadTimeout = expTimeOut * time.Second
> s.WriteTimeout = expTimeOut * time.Second
>
> On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
>> Both servers and data sources are in the cloud.
>> I would not say a lot of data, it's precomputed data which shouldn't take
>> that long.
>>
>>
>> On Tue, Aug 25, 2020 at 11:25 AM Robert Engels <reng...@ix.netcom.com>
>> wrote:
>>
>>> Are you transferring a lot of data? Are the servers non-cloud hosted?
>>>
>>> You could be encountering “tcp stalls”.
>>>
>>> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
>>> 
>>> Clients are over the internet.
>>>
>>> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels <reng...@ix.netcom.com>
>>> wrote:
>>>
>>>> The tcp protocol allows the connection to wait for hours. Go routines
>>>> stuck in wait do not burn CPU. Are the clients local or remote (over
>>>> internet)?
>>>>
>>>> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar <
>>>> siddhesh.dive...@gmail.com> wrote:
>>>>
>>>> 
>>>> Robert,
>>>>
>>>> We will do the profiling next time we hit the issue again & see what is
>>>> happening.
>>>> This was the first time we saw the issue & don't want to get rid of
>>>> http2 advantages without making sure it's the actual culprit.
>>>>
>>>> Do you think in the meanwhile we should do what the discussion below
>>>> suggests anyways ?
>>>>
>>>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>>>
>>>>
>>>> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels <reng...@ix.netcom.com>
>>>> wrote:
>>>>
>>>>> I think it is too hard to tell with the limited information. It could
>>>>> be exhausted connections or it could be thrashing (the claim of high cpu)
>>>>>
>>>>> I think you want to run profiling capture prior to hitting the stick
>>>>> state - you should be able to detect what is happening.
>>>>>
>>>>> If the referenced issue is related I would assume you should be able
>>>>> to connect by forcing http/1.
>>>>>
>>>>> You can also try disabling http/2 and see if your issue goes away.
>>>>>
>>>>> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar <
>>>>> siddhesh.dive...@gmail.com> wrote:
>>>>>
>>>>> Hi Robert,
>>>>>
>>>>> Sorry I missed your earlier response.
>>>>>
>>>>> From what we saw our UI was blocked and since everything was
>>>>> unresponsive
>>>>> we had to recover the system by sending sig abort.
>>>>>
>>>>> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar <
>>>>> siddhesh.dive...@gmail.com> wrote:
>>>>>
>>>>>> Looking at the no. of go routines we have does this apply to this
>>>>>> issue ?
>>>>>> https://github.com/golang/go/issues/27044
>>>>>>
>>>>>> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels <reng...@ix.netcom.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Go routines in a waiting state should not be consuming CPU. Are you
>>>>>>> certain they are not in constant transition from waiting to processing -
>>>>>>> this could show up as high CPU usage while everything looks blocks.
>>>>>>>
>>>>>>> I would use pprof - github.com/robaho/goanalyzer might be of
>>>>>>> assistance here to see the actual work being done.
>>>>>>>
>>>>>>> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar <
>>>>>>> siddhesh.dive...@gmail.com> wrote:
>>>>>>>
>>>>>>> 
>>>>>>> Hi Ian,
>>>>>>>
>>>>>>> Thanks for replying.
>>>>>>>
>>>>>>> We have a go server running which handles user requests & collects
>>>>>>> data from various sources like gcp's cloud sql and big query.
>>>>>>> We are using shopify's sarama library to do kafka operations.
>>>>>>>
>>>>>>> There are seeing lots of go routines in waiting state for several
>>>>>>> minutes.
>>>>>>> Over the period of time around 587 goroutines have been spun up.
>>>>>>>
>>>>>>> We see that two go routines are stuck on gcp big query and we are
>>>>>>> using wait groups there.
>>>>>>> However, it's not clear why that would cause all other go routines
>>>>>>> to get hung & make cpu go high.
>>>>>>>
>>>>>>> goroutine 3332131 [semacquire, 79 minutes]:
>>>>>>> sync.runtime_Semacquire(0xc001c4fcf8)
>>>>>>>         /usr/local/go/src/runtime/sema.go:56 +0x42
>>>>>>> sync.(*WaitGroup).Wait(0xc001c4fcf0)
>>>>>>>         /usr/local/go/src/sync/waitgroup.go:130 +0x64
>>>>>>>
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>>>>>>> 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
>>>>>>>         /builds/fusionio/fusion/controller/stats/ins_mgr.go:488
>>>>>>> +0x1d7
>>>>>>>
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>>>>>>> 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
>>>>>>>         /builds/fusionio/fusion/controller/stats/ins_mgr.go:567
>>>>>>> +0xa0d
>>>>>>>
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>>>>>>> 0xc002356810, 0x24, 0x32f7b78, 0x5)
>>>>>>>         /builds/fusionio/fusion/controller/stats/prefetcher.go:77
>>>>>>> +0xf2
>>>>>>> created by
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>>>>>>>         /builds/fusionio/fusion/controller/stats/prefetcher.go:100
>>>>>>> +0xd8
>>>>>>>
>>>>>>>
>>>>>>> goroutine 3332149 [semacquire, 79 minutes]:
>>>>>>> sync.runtime_Semacquire(0xc0015ede48)
>>>>>>>         /usr/local/go/src/runtime/sema.go:56 +0x42
>>>>>>> sync.(*WaitGroup).Wait(0xc0015ede40)
>>>>>>>         /usr/local/go/src/sync/waitgroup.go:130 +0x64
>>>>>>>
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>>>>>>> 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
>>>>>>>         /builds/fusionio/fusion/controller/stats/ins_mgr.go:488
>>>>>>> +0x1d7
>>>>>>>
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>>>>>>> 0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
>>>>>>>         /builds/fusionio/fusion/controller/stats/ins_mgr.go:567
>>>>>>> +0xa0d
>>>>>>>
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>>>>>>> 0xc002356ba0, 0x24, 0x32f7b78, 0x5)
>>>>>>>         /builds/fusionio/fusion/controller/stats/prefetcher.go:77
>>>>>>> +0xf2
>>>>>>> created by
>>>>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>>>>>>>         /builds/fusionio/fusion/controller/stats/prefetcher.go:100
>>>>>>> +0xd8
>>>>>>>
>>>>>>> I found the link below which kind of co-relates to our scenario.
>>>>>>>
>>>>>>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>>>>>>
>>>>>>> Most of the go routines in the backtrace are in a net/http package
>>>>>>> so our suspicion is that above bug in our code might be causing that.
>>>>>>> Even the bigquery is getting hung in net/http.
>>>>>>>
>>>>>>> We are using go version 1.13.8 & are running on gcp kubernetes
>>>>>>> cluster on ubuntu 18.04 docker.
>>>>>>>
>>>>>>> go env
>>>>>>> GO111MODULE=""
>>>>>>> GOARCH="amd64"
>>>>>>> GOBIN=""
>>>>>>> GOCACHE="/root/.cache/go-build"
>>>>>>> GOENV="/root/.config/go/env"
>>>>>>> GOEXE=""
>>>>>>> GOFLAGS=""
>>>>>>> GOHOSTARCH="amd64"
>>>>>>> GOHOSTOS="linux"
>>>>>>> GONOPROXY=""
>>>>>>> GONOSUMDB=""
>>>>>>> GOOS="linux"
>>>>>>> GOPATH="/root/go"
>>>>>>> GOPRIVATE=""
>>>>>>> GOPROXY="https://proxy.golang.org,direct";
>>>>>>> GOROOT="/usr/local/go"
>>>>>>> GOSUMDB="sum.golang.org"
>>>>>>> GOTMPDIR=""
>>>>>>> GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
>>>>>>> GCCGO="gccgo"
>>>>>>> AR="ar"
>>>>>>> CC="gcc"
>>>>>>> CXX="g++"
>>>>>>> CGO_ENABLED="1"
>>>>>>> GOMOD="/builds/prosimoio/prosimo/pdash/go.mod"
>>>>>>> CGO_CFLAGS="-g -O2"
>>>>>>> CGO_CPPFLAGS=""
>>>>>>> CGO_CXXFLAGS="-g -O2"
>>>>>>> CGO_FFLAGS="-g -O2"
>>>>>>> CGO_LDFLAGS="-g -O2"
>>>>>>> PKG_CONFIG="pkg-config"
>>>>>>> GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0
>>>>>>> -fdebug-prefix-map=/tmp/go-build048009048=/tmp/go-build
>>>>>>> -gno-record-gcc-switches"
>>>>>>>
>>>>>>> Let me know if any other information is needed.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 22, 2020 at 12:30 PM Ian Lance Taylor <i...@golang.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Sat, Aug 22, 2020 at 12:06 PM Siddhesh Divekar
>>>>>>>> <siddhesh.dive...@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > We saw an issue with our process running in k8s on ubuntu 18.04.
>>>>>>>> > Most of the go routines are stuck for several minutes in
>>>>>>>> http/http2 net code.
>>>>>>>> >
>>>>>>>> > Have you seen similar issues ?
>>>>>>>> >
>>>>>>>> > goroutine 2800143 [select, 324 minutes]:
>>>>>>>> net/http.(*persistConn).readLoop(0xc00187d440)
>>>>>>>> /usr/local/go/src/net/http/transport.go:2032 +0x999 created by
>>>>>>>> net/http.(*Transport).dialConn 
>>>>>>>> /usr/local/go/src/net/http/transport.go:1580
>>>>>>>> +0xb0d goroutine 2738894 [IO wait, 352 minutes]:
>>>>>>>> internal/poll.runtime_pollWait(0x7f5b61b280c0, 0x72, 
>>>>>>>> 0xffffffffffffffff)
>>>>>>>> /usr/local/go/src/runtime/netpoll.go:184 +0x55
>>>>>>>> internal/poll.(*pollDesc).wait(0xc0017e7e18, 0x72, 0x1000, 0x1000,
>>>>>>>> 0xffffffffffffffff) 
>>>>>>>> /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
>>>>>>>> +0x45 internal/poll.(*pollDesc).waitRead(...)
>>>>>>>> /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
>>>>>>>> internal/poll.(*FD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, 
>>>>>>>> 0x0,
>>>>>>>> 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf
>>>>>>>> net.(*netFD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, 
>>>>>>>> 0xc0026359e8,
>>>>>>>> 0x49d7fd, 0xc0017e7e00) /usr/local/go/src/net/fd_unix.go:202 +0x4f
>>>>>>>> net.(*conn).Read(0xc0000db8b8, 0xc0044a9000, 0x1000, 0x1000, 0x0, 0x0, 
>>>>>>>> 0x0)
>>>>>>>> /usr/local/go/src/net/net.go:184 +0x68
>>>>>>>> net/http.(*connReader).Read(0xc004a4fec0, 0xc0044a9000, 0x1000, 0x1000,
>>>>>>>> 0x0, 0x0, 0x0) /usr/local/go/src/net/http/server.go:785 +0xf4
>>>>>>>> bufio.(*Reader).fill(0xc003f1a360) /usr/local/go/src/bufio/bufio.go:100
>>>>>>>> +0x103 bufio.(*Reader).Peek(0xc003f1a360, 0x4, 0x0, 0x0, 0x0, 0x0,
>>>>>>>> 0xc002635ad0) /usr/local/go/src/bufio/bufio.go:138 +0x4f
>>>>>>>> net/http.(*conn).readRequest(0xc0028e1d60, 0x393ed20, 0xc0024e9780, 
>>>>>>>> 0x0,
>>>>>>>> 0x0, 0x0) /usr/local/go/src/net/http/server.go:962 +0xb3b
>>>>>>>> net/http.(*conn).serve(0xc0028e1d60, 0x393ed20, 0xc0024e9780)
>>>>>>>> /usr/local/go/src/net/http/server.go:1817 +0x6d4 created by
>>>>>>>> net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2928 
>>>>>>>> +0x384
>>>>>>>> >
>>>>>>>> > Is there a know issue or something obvious from the backtrace
>>>>>>>> here.
>>>>>>>>
>>>>>>>> It's entirely normal for goroutines to sit in pollWait if they are
>>>>>>>> waiting for network I/O.  There may be reasons why this is incorrect
>>>>>>>> for your program, but you'll have to tell us those reasons.
>>>>>>>>
>>>>>>>> Also, along with those reasons, please tell us the version of Go and
>>>>>>>> the exact environment that you are running.  Thanks.
>>>>>>>>
>>>>>>>> Ian
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -Siddhesh.
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "golang-nuts" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to golang-nuts+unsubscr...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BggC%2BwgwM_%2BM5ML0SKD3qJphCrif%3D4c2AqB9v6n%2Btw5Jw%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BggC%2BwgwM_%2BM5ML0SKD3qJphCrif%3D4c2AqB9v6n%2Btw5Jw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> <backtrace.txt>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Siddhesh.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Siddhesh.
>>>>>
>>>>>
>>>>
>>>> --
>>>> -Siddhesh.
>>>>
>>>>
>>>
>>> --
>>> -Siddhesh.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to golang-nuts+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgQDHshxmGFiUFh6Ket9fp%2BwMuhgH6-FksyzMp9wz%2BXug%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgQDHshxmGFiUFh6Ket9fp%2BwMuhgH6-FksyzMp9wz%2BXug%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>>
>>
>> --
>> -Siddhesh.
>>
>
>
> --
> -Siddhesh.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgHmUTo-f4u2NOtvzuZ-VH7708%3DXBcM02c76KDCCPiY%3Dw%40mail.gmail.com
> <https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgHmUTo-f4u2NOtvzuZ-VH7708%3DXBcM02c76KDCCPiY%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
-Siddhesh.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BjGBBb-sVL5XhEetT9v9MTCdW_qU%3D4KU1tv89M7zBP3Lw%40mail.gmail.com.

Reply via email to