On Tue, 9 Jan 2024 09:22:57 +0000 Carsten Haitzler <ras...@rasterman.com> said:
> On Mon, 8 Jan 2024 23:20:16 -0800 Ross Vandegrift <r...@kallisti.us> said: > > > On Mon, Jan 08, 2024 at 11:08:55PM +0000, Carsten Haitzler wrote: > > > try run the above eina test suite and pipe to something that makes it > > > timeout... and strace it - or gdb attach to it and find out where it's > > > sitting? it should complete in < 1 sec so launch and immediately try and > > > strace and/or gdb attach and find out where it's at - if it is still > > > around. > > > > > > is somehow a forked child not coming back that it expects to... ? > > > > Yea, it's something like this. I found out it hangs for exactly 60s, which > > lead me to timeout.c. I also learned strace -f triggers the issue. > > CK_FORK=no fixes the hang as well. > > > > I added debug printfs to efl_check.h and timeout.c - when eina_suite tries > > to kill timeout, it kills the wrong pid: > > > > $ ./build/src/tests/eina/eina_suite fp > > Running suite(s): eina_init_module > > 100%: Checks: 0, Failures: 0, Errors: 0 > > -------------------- efl_check forked timeout: 296393 <----- > > -------------------- efl_check forked timeout: 0 > > Running suite(s): Eina > > -------------------- timeout.c my pid: 296396 <----- > > Max delta(multiplication): 0.007627 (0.061668%) > > Max delta(division): 0.000173 (0.740211%) > > 100%: Checks: 4, Failures: 0, Errors: 0 > > -------------------- efl_check killing timeout child: 296393 > > -------------------- efl_check cleared timeout_pid: 0 > > > > So eina_suite.c gets the wrong pid from fork(). In a simple standalone > > program, fork() behaves as expected. > > > > I'm going to compare the arch & debian check packages for any suspicious > > differences. And maybe walk through more carefully with gdb. But I'm out > > of time tonight. > > > > Ross > > some more testing. i ran: > > ./src/tests/eina/eina_suite | wc -l > > and... guess what... eina_suite has gone but wc is still there waiting. this > is far deeper... is there some cgroup, selinux or something thing getting in > the way? is it a kernel bug? a glibc bug? i just have to say.. i the efl test > process is gone - and wc is still waiting the problem is somewhere in the > plumbing between these IMHO... at least that's what my brain is thinking > right now. pstree: > > │ │ │ │ > ├─terminology─┬─zsh───wc │ │ │ > │ │ └─3*[{terminology}] > > :( and an update... strace'd eina suite + wc and.. well. eina_suite: ... close(3) = 0 close(4) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=55000, si_uid=1000, si_status=SIGKILL, si_utime=0, si_stime=0} --- exit_group(0) = ? +++ exited with 0 +++ so it exited... with a nice 0 exit code but wc is: ... read(0, "100%: Checks: 7, Failures: 0, Er"..., 16384) = 40 read(0, "100%: Checks: 3, Failures: 0, Er"..., 16384) = 40 read(0, "100%: Checks: 10, Failures: 0, E"..., 16384) = 41 read(0, it's sitting on a blocking read ... obviously on the fd that was tied to the pipe to eina suite... and that read doesn't complete. it should at this point return ... but doesn't. in the meantime i'll commit my extended 204sec timeout changes as well as some more error checking of fork and execl - in this case it's not these as the execl for the timeout binary does work (it's optimistically assumed to always work and never fail... which is not our problem here, and fork is assumed to always work and never fail - i added some return checks there but again this is not the issue... so what i have found is... eina_suite exits... but the timeout binary does not - it's still sleeping sitting around and instead of it terminating with the parent. it SHOULD terminate with the parent no matter what... but doesn't and well eina suite is killing timeout - but its the /bin/sh parent of timeout (execl uses /bin/sh to run timeout). this doesn't take timeout down with it though... i would have thought a sigpipe from the parent /bin/sh should have done this... as timeout was not detached from the shell with a &... so this has something to do with /bin/sh ... some change? so the timeout process stays around keeping the whole parent shell + pipe alive committing some fixes to work around this /bin/sh oddity along with the above fork+execl return checks to be less optimisitic. -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- Carsten Haitzler - ras...@rasterman.com _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel