Hi Brian, thanks for your help. Al my go downloads are from go.dev In this tests I am using
$ go version go version go1.18 linux/amd64 I run the tests in a laptop with linux as the main OS, no emulation. $ grep "model" /proc/cpuinfo | sort | uniq -c 12 model : 165 12 model name : Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz $ uname -r 5.4.0-107-generic $ cat /etc/os-release NAME="Linux Mint" VERSION="20.3 (Una)" I just spent 5 hours obsesively trying to reproduce it and I know it sounds very strange. The 100% CPU spikes is how I started noticing this, because the laptop fan would start spinning at full speed. Since then, I am using go1.16 from go.dev and "go install golang.org/dl/go1.16@latest && go1.16 download" and I haven't experienced it again. I checked the issue links and it seems similar to what I am seeing, but only from version 1.17 and on. Also they say you can dissable it with GODEBUG=asyncpreemptoff=1 but I see the problem in the build version. Maybe the difference with go run is that it probably adds debug flags? I expected it to just compile the program in a temp location and run it, but this is a consistent difference I just tested: $ go run test.go And in another console: $ ps aux | grep test.go bill 143464 2.5 0.1 1897584 16480 pts/3 Sl+ 09:54 0:00 go run test.go bill 143604 0.0 0.0 11568 2764 pts/4 S+ 09:54 0:00 grep --color=auto test.go $ sudo strace -f -tt -p 143464 strace: Process 143464 attached with 17 threads [pid 143544] 09:54:42.564230 futex(0xc000224148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143485] 09:54:42.564321 futex(0xc00056a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143478] 09:54:42.564340 epoll_pwait(4, <unfinished ...> [pid 143477] 09:54:42.564366 futex(0xc000580548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143476] 09:54:42.564386 futex(0xc000680148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143475] 09:54:42.564407 futex(0xc000580148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143474] 09:54:42.564430 futex(0xc000500148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143473] 09:54:42.564451 futex(0xdf3e80, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143472] 09:54:42.564471 futex(0xc000400148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143471] 09:54:42.564493 futex(0xc000380148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143470] 09:54:42.564515 futex(0xc000300148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143469] 09:54:42.564535 futex(0xdf3d38, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143468] 09:54:42.564556 futex(0xc000090148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143467] 09:54:42.564576 futex(0xc000023148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143466] 09:54:42.564596 futex(0xc000022d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143465] 09:54:42.564616 restart_syscall(<... resuming interrupted read ...> <unfinished ...> [pid 143464] 09:54:42.564637 waitid(P_PID, 143586, <unfinished ...> [pid 143465] 09:55:14.806222 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 143465] 09:55:14.806608 nanosleep({tv_sec=0, tv_nsec=20000}, NULL) = 0 [pid 143465] 09:55:14.807053 futex(0xdc55d8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} In another console I would request multiple times: $ curl http://localhost:8080 or $ wrk -d2 http://localhost:8080 but strace won't show anything else. It just stays like that and CPU's show 0% activity when is not getting requests. On the other hand, if I compile it: $ go build test.go $ ./test Check strace in another console, it starts like this: $ ps aux | grep ./test bill 143817 0.0 0.0 1077296 4448 pts/3 Sl+ 09:57 0:00 ./test bill 143829 0.0 0.0 11568 712 pts/4 S+ 09:58 0:00 grep --color=auto ./test $ sudo strace -f -tt -p 143817 strace: Process 143817 attached with 6 threads [pid 143818] 09:58:23.811515 restart_syscall(<... resuming interrupted read ...> <unfinished ...> [pid 143822] 09:58:23.811538 futex(0x85f438, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143821] 09:58:23.811561 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143820] 09:58:23.811580 futex(0xc000015148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143819] 09:58:23.811585 futex(0xc000014d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143817] 09:58:23.811613 epoll_pwait(5, But below is what I get with a single curl request: $ sudo strace -f -tt -p 143817 strace: Process 143817 attached with 6 threads [pid 143818] 09:58:23.811515 restart_syscall(<... resuming interrupted read ...> <unfinished ...> [pid 143822] 09:58:23.811538 futex(0x85f438, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143821] 09:58:23.811561 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143820] 09:58:23.811580 futex(0xc000015148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143819] 09:58:23.811585 futex(0xc000014d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143817] 09:58:23.811613 epoll_pwait(5, <unfinished ...> [pid 143818] 09:58:42.447774 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 143818] 09:58:42.447898 nanosleep({tv_sec=0, tv_nsec=20000}, NULL) = 0 [pid 143818] 09:58:42.448141 futex(0x830ed8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...> [pid 143817] 09:59:30.164453 <... epoll_pwait resumed>[{EPOLLIN, {u32=1288671128, u64=140674352517016}}], 128, -1, NULL, 0) = 1 [pid 143817] 09:59:30.164531 futex(0x830ed8, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 143818] 09:59:30.164725 <... futex resumed>) = 0 [pid 143817] 09:59:30.164737 accept4(3, <unfinished ...> [pid 143818] 09:59:30.164859 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...> [pid 143817] 09:59:30.164889 <... accept4 resumed>{sa_family=AF_INET6, sin6_port=htons(40122), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [112->28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 4 [pid 143817] 09:59:30.164962 epoll_ctl(5, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1288670888, u64=140674352516776}} <unfinished ...> [pid 143818] 09:59:30.164993 <... nanosleep resumed>NULL) = 0 [pid 143817] 09:59:30.165040 <... epoll_ctl resumed>) = 0 [pid 143818] 09:59:30.165072 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...> [pid 143817] 09:59:30.165097 getsockname(4, {sa_family=AF_INET6, sin6_port=htons(8080), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [112->28]) = 0 [pid 143817] 09:59:30.165142 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 143817] 09:59:30.165162 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 [pid 143817] 09:59:30.165180 setsockopt(4, SOL_TCP, TCP_KEEPINTVL, [15], 4) = 0 [pid 143817] 09:59:30.165199 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [15], 4 <unfinished ...> [pid 143818] 09:59:30.165209 <... nanosleep resumed>NULL) = 0 [pid 143817] 09:59:30.165215 <... setsockopt resumed>) = 0 [pid 143818] 09:59:30.165222 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...> [pid 143817] 09:59:30.165271 futex(0xc000015148, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 143820] 09:59:30.165311 <... futex resumed>) = 0 [pid 143817] 09:59:30.165317 accept4(3, 0xc00012db20, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) [pid 143818] 09:59:30.165373 <... nanosleep resumed>NULL) = 0 [pid 143820] 09:59:30.165380 epoll_pwait(5, <unfinished ...> [pid 143818] 09:59:30.165403 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...> [pid 143820] 09:59:30.165438 <... epoll_pwait resumed>[{EPOLLIN|EPOLLOUT, {u32=1288670888, u64=140674352516776}}], 128, 0, NULL, 0) = 1 [pid 143820] 09:59:30.165501 epoll_pwait(5, <unfinished ...> [pid 143818] 09:59:30.165509 <... nanosleep resumed>NULL) = 0 [pid 143817] 09:59:30.165515 read(4, <unfinished ...> [pid 143818] 09:59:30.165536 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...> [pid 143817] 09:59:30.165564 <... read resumed>"GET / HTTP/1.1\r\nHost: localhost:"..., 4096) = 78 [pid 143817] 09:59:30.165595 futex(0xc000080148, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 143821] 09:59:30.165610 <... futex resumed>) = 0 [pid 143821] 09:59:30.165624 nanosleep({tv_sec=0, tv_nsec=3000}, <unfinished ...> [pid 143817] 09:59:30.165632 write(4, "HTTP/1.1 200 OK\r\nDate: Mon, 11 A"..., 120 <unfinished ...> [pid 143818] 09:59:30.165642 <... nanosleep resumed>NULL) = 0 [pid 143817] 09:59:30.165669 <... write resumed>) = 120 [pid 143821] 09:59:30.165700 <... nanosleep resumed>NULL) = 0 [pid 143818] 09:59:30.165706 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...> [pid 143821] 09:59:30.165716 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143817] 09:59:30.165739 read(4, 0xc000176000, 4096) = -1 EAGAIN (Resource temporarily unavailable) [pid 143817] 09:59:30.165785 futex(0x830108, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 143820] 09:59:30.165794 <... epoll_pwait resumed>[{EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=1288670888, u64=140674352516776}}], 128, 4999, NULL, 34914803367208) = 1 [pid 143820] 09:59:30.165809 read(4, "", 4096) = 0 [pid 143820] 09:59:30.165832 epoll_ctl(5, EPOLL_CTL_DEL, 4, 0xc000123974) = 0 [pid 143820] 09:59:30.165850 close(4 <unfinished ...> [pid 143818] 09:59:30.165858 <... nanosleep resumed>NULL) = 0 [pid 143818] 09:59:30.165870 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...> [pid 143820] 09:59:30.165879 <... close resumed>) = 0 [pid 143820] 09:59:30.165892 epoll_pwait(5, [], 128, 0, NULL, 1) = 0 [pid 143820] 09:59:30.165909 epoll_pwait(5, <unfinished ...> [pid 143818] 09:59:30.165953 <... nanosleep resumed>NULL) = 0 [pid 143818] 09:59:30.165963 futex(0x830ed8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} and the output keeps growing like that with every new curl request until eventually it starts constantly printing output like this: [pid 143818] 10:02:22.697120 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.707609 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.718125 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.728753 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.739195 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.749618 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.760160 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.770669 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.781137 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.791581 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.802060 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.812540 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.822981 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 [pid 143818] 10:02:22.833468 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 In htop, without strace attached, CPU is constant about 0.7% for the first one or two cores. El lunes, 11 de abril de 2022 a las 8:54:10 UTC+2, Brian Candler escribió: > Those nanosleep calls are at 10ms intervals - i.e. 100 calls per second. > I think this is normal behaviour of the Go scheduler, which signals itself > every 10ms to force a preemptive context switch. That feature was > introduced in go 1.14, at which point you were able to disable it at > runtime setting environment variable GODEBUG=asyncpreemptoff=1 (I haven't > tested whether you can still do this). > > There were some problems when this feature was first introduced (see this > <https://github.com/kubernetes/kubernetes/issues/92521> and this > <https://github.com/cockroachdb/cockroach/pull/52348>) but improvements > were made in 1.15 and 1.16. I don't see anything in the 1.17 release > notes <https://go.dev/doc/go1.17> which would affect this. > > But in any case: > 1. That won't cause CPU to go to 100% > 2. I can't think why there should be any difference between "go run", > versus "go build" and running the resulting executable. > 3. I can't think why changing HTTP timeouts from 30 seconds to 1 second > should make any difference. > > Point (2) is going to make the problem hard to reproduce for other people. > > To start with, I think you need to be more specific about: > - exactly what versions of go you're comparing ("go version"), and where > you got them from (e.g. directly from go downloads page or from OS vendor > packages) > - what exact Linux kernel you're running (uname -a) > - what exact version and distro of Linux you're using > - your CPU type (grep "model" /proc/cpuinfo | sort | uniq -c) > - anything that's potentially different about your environment - e.g. are > you running Linux under an emulation layer like WSL (Windows) > > On Sunday, 10 April 2022 at 23:02:40 UTC+1 scorr...@gmail.com wrote: > >> Hi, >> >> On Linux, if I compile and run this simple program: >> >> package main >> >> import ( >> "fmt" >> "net/http" >> "time" >> ) >> >> type x struct{} >> >> func (x) ServeHTTP(w http.ResponseWriter, r *http.Request) { >> w.Write([]byte("aaaa")) >> } >> >> func main() { >> sp := &http.Server{ >> ReadHeaderTimeout: 5 * time.Second, >> ReadTimeout: 5 * time.Second, >> WriteTimeout: 5 * time.Second, >> IdleTimeout: 30 * time.Second, >> Addr: ":8080", >> Handler: x{}, >> } >> >> fmt.Println(sp.ListenAndServe()) >> } >> >> After a while I start to see constant CPU activity (1% to 7%) even when >> the server is idle. >> >> If I run "sudo strace -f -c -p <pid>" I get constant very fast calls like: >> >> [pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 >> [pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 >> [pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 >> [pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 >> [pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 >> .... >> >> It never stops. Sometimes the CPU goes to 100% for a while and then go >> back down to 1%.. This happens only in one or two cores. >> >> This doesn't happen with Go 1.16. I have been checking all versions and >> it starts in Go 1.17 >> >> If I run the program with go run instead of compiling and running the >> binary, I can't reproduce it. >> >> Sometimes it starts after a couple of http calls and sometimes I do a few >> "wrk -d2 http://localhost:8080" calls to trigger it (wrk is a web stress >> tool). But as soon as it starts making the nanosleep calls, it doesn't >> stop, no matter how long the server is iddle. >> >> If I remove the timeout values or set them to a short period, like one >> second for example, I am also not able to reproduce it. But with longer >> values it happens very quickly. I am surprised of not seing it reported. >> >> Do you know what can be going on? >> >> Thank you! >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/9ad73d36-974c-43ec-8a5f-570f030ac69en%40googlegroups.com.