On Thu, Feb 14, 2019 at 10:36:42PM +0100, Johannes Schindelin wrote:
> On Thu, 14 Feb 2019, Randall S. Becker wrote:
>
> > t5562 still hangs (blocking) - this breaks our CI pipeline since the
> > test hangs and we have no explanation of whether the hang is in git or
> > the tests.
>
> I have "good" news: it now also hangs on Ubuntu 16.04 in Azure Pipelines'
> Linux agents.
I haven't yet seen that hang in the wild and couldn't reproduce it on
purpose, but there is definitely something fishy with t5562 even on
Linux and even without that perl generate_zero_bytes helper.
$ git checkout cc95bc2025^
Previous HEAD position was cc95bc2025 t5562: replace /dev/zero with a pipe
from generate_zero_bytes
HEAD is now at 24b451e77c t5318: replace use of /dev/zero with
generate_zero_bytes
$ make
<snip>
$ cd t
# take note of the shell's PID
$ echo $$
15522
$ ./t5562-http-backend-content-length.sh --stress |tee LOG
OK 3.0
OK 1.0
OK 6.0
OK 0.0
<snap>
And then in another terminal run this:
$ pstree -a -p 15522
or, to make it easier noticable what changed and what stayed the same:
$ watch -d pstree -a -p 15522
The output will sooner or later will look like this:
bash,15522
└─t5562-http-back,21082 ./t5562-http-backend-content-length.sh --stress
├─t5562-http-back,21089 ./t5562-http-backend-content-length.sh --stress
│ └─sh,24906 ./t5562-http-backend-content-length.sh --stress
├─t5562-http-back,21090 ./t5562-http-backend-content-length.sh --stress
│ └─sh,26660 ./t5562-http-backend-content-length.sh --stress
├─t5562-http-back,21092 ./t5562-http-backend-content-length.sh --stress
│ └─sh,4202 ./t5562-http-backend-content-length.sh --stress
│ └─sh,5696 ./t5562-http-backend-content-length.sh --stress
│ └─perl,5697
/home/szeder/src/git/t/t5562/invoke-with-content-length.pl push_body.gz.trunc
git http-backend
│ └─(git,5722)
├─t5562-http-back,21093 ./t5562-http-backend-content-length.sh --stress
│ └─sh,25572 ./t5562-http-backend-content-length.sh --stress
<snip>
It won't show most of the processes run in the tests, because they are
just too fast and short-lived. However, occasionally it does show a
stuck git process, which is shown as <defunct> in regular 'ps aux'
output:
szeder 5722 0.0 0.0 0 0 pts/16 Z+ 13:36 0:00 [git]
<defunct>
Note that this is not a "proper" hang, in the sense that this process
is not stuck forever, but only for about 1 minute, after which it
disappears, and the test continues and eventually finishes with
success. I've looked into the logs of a couple of such stuck jobs,
and it seems that it varies in which test that git process happened to
get stuck.