On Fri, Jul 3, 2020 at 5:54 AM Hugo Cornelis <hugo.corne...@essensium.com> wrote: > > Thanks for your answer. > > On Mon, Jun 29, 2020 at 9:10 PM Ian Lance Taylor <i...@golang.org> wrote: >> >> Thanks for the background. >> >> Earlier I suggested looking at the output of "strace -f" for the >> programs that fail. Does that show anything of interest? > > > What follows is the analysis of one strace (strace -fv -s 100) attached to > the docker daemon. > > The strace log file shows the creation of a chain of processes: dockerd forks > containerd forks containerd-shim forks runc forks a command that runs inside > the container (the command is '/usr/bin/find .'). This is also expected. > > When the I/O of the process /usr/bin/find in the docker container is blocked, > strace shows that the Golang schedulers are still active: > > (line 298015) > [pid 2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} > <unfinished ...> > [pid 2264] sched_yield( <unfinished ...> > [pid 2270] swapcontext(0x2190a580, 0 <unfinished ...> > [pid 2264] <... sched_yield resumed>) = 0 > [pid 2270] <... swapcontext resumed>) = 0 > [pid 2264] sched_yield( <unfinished ...> > [pid 2270] swapcontext(0, 0x214dda80 <unfinished ...> > [pid 2264] <... sched_yield resumed>) = 0 > [pid 2270] <... swapcontext resumed>) = 558750336 > [pid 2264] sched_yield( <unfinished ...> > [pid 2270] swapcontext(0, 0x2190a580 <unfinished ...> > [pid 2264] <... sched_yield resumed>) = 0 > [pid 2270] <... swapcontext resumed>) = 563127680 > [pid 2264] sched_yield( <unfinished ...> > [pid 2270] swapcontext(0x2190a580, 0 <unfinished ...> > [pid 2264] <... sched_yield resumed>) = 0 > [pid 2270] <... swapcontext resumed>) = 0 > ... > (line 298134) > [pid 2266] <... _newselect resumed>) = 0 (Timeout) > [pid 2264] <... sched_yield resumed>) = 0 > [pid 2270] <... swapcontext resumed>) = 563127680 > [pid 2266] epoll_wait(4, <unfinished ...> > [pid 2264] sched_yield( <unfinished ...> > [pid 2270] swapcontext(0x2190a580, 0 <unfinished ...> > [pid 2266] <... epoll_wait resumed>[], 128, 0) = 0 > [pid 2264] <... sched_yield resumed>) = 0 > [pid 2270] <... swapcontext resumed>) = 0 > [pid 2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} > <unfinished ...> > > TIDs 2264, 2266 and 2270 belong to the process runc. The strace log has > similar straces for the other processes (dockerd, containerd, > containerd-shim), so I assume also their goroutine schedulers were active. I > am actually wondering how to relate the arguments listed in the strace file > to the Golang or C code. > > Just before the container gets blocked it runs the command 'find .' that > should produce output to the terminal (but there is no output at first, that > is the problem). The data is visible in the strace log through the 'write()' > system call: > > [pid 2204] execve("/usr/bin/find", ["/usr/bin/find", "."], > ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", > "HOSTNAME=caae2fb3bccf", "TERM=xterm", "HOME=/root"] <unfinished ...> > ... > [pid 2204] lstat64(".", <unfinished ...> > ... > [pid 2204] write(1, ".\n", 2 <unfinished ...> > ... > [pid 2204] lstat64("./var", <unfinished ...> > ... > [pid 2204] write(1, "./var\n", 6 <unfinished ...> > ... > > The 'find' process writes 626 lines to stdout (18778 characters, this seems > to be reproducible). The last lines are: > > ... > [pid 2204] lstat64("./sys/kernel/slab/rpc_buffers", <unfinished ...> > ... > [pid 2204] write(1, "./sys/kernel/slab/rpc_buffers\n", 30 <unfinished ...> > > All the write() system calls except the last one are successfully completed. > The last one remains blocked. At that time PID 2204 hangs for a long time > > runc / dockerd / containerd / containerd-shim have continuous activity as in > the first strace that I showed above. > > When I terminate the docker process with a signal SIGINT, the characters > written by PID 2204 are suddenly flushed to the terminal. > > Are there anything specific things to look for in the strace files (eg. > specific epoll() calls). Is there a way to map the arguments and return > values of swapcontext() to goroutines, or would this be a useless thing to > try to do? > > This is the analysis of one trial. Some of the other trials did not start > the full chain of processes, it looks like the behaviour of the bug is also > timing dependent.
That looks like the process is writing to a pipe that nothing is reading from. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcWn-em6UzwUifemOttSToZNyNJ2VoqMZsH%2B-%3DhdUVoC%3DQ%40mail.gmail.com.