On Tue, 9 Jan 2024 09:22:57 +0000 Carsten Haitzler <ras...@rasterman.com> said:

> On Mon, 8 Jan 2024 23:20:16 -0800 Ross Vandegrift <r...@kallisti.us> said:
> 
> > On Mon, Jan 08, 2024 at 11:08:55PM +0000, Carsten Haitzler wrote:
> > > try run the above eina test suite and pipe to something that makes it
> > > timeout... and strace it - or gdb attach to it and find out where it's
> > > sitting? it should complete in < 1 sec so launch and immediately try and
> > > strace and/or gdb attach and find out where it's at - if it is still
> > > around.
> > > 
> > > is somehow a forked child not coming back that it expects to... ?
> > 
> > Yea, it's something like this.  I found out it hangs for exactly 60s, which
> > lead me to timeout.c.  I also learned strace -f triggers the issue.
> > CK_FORK=no fixes the hang as well.
> > 
> > I added debug printfs to efl_check.h and timeout.c - when eina_suite tries
> > to kill timeout, it kills the wrong pid:
> > 
> >   $ ./build/src/tests/eina/eina_suite fp
> >   Running suite(s): eina_init_module
> >   100%: Checks: 0, Failures: 0, Errors: 0
> >   -------------------- efl_check forked timeout: 296393    <-----
> >   -------------------- efl_check forked timeout: 0
> >   Running suite(s): Eina
> >   -------------------- timeout.c my pid: 296396            <-----
> >   Max delta(multiplication): 0.007627 (0.061668%)
> >   Max delta(division): 0.000173 (0.740211%)
> >   100%: Checks: 4, Failures: 0, Errors: 0
> >   -------------------- efl_check killing timeout child: 296393
> >   -------------------- efl_check cleared timeout_pid: 0
> > 
> > So eina_suite.c gets the wrong pid from fork().  In a simple standalone
> > program, fork() behaves as expected.
> > 
> > I'm going to compare the arch & debian check packages for any suspicious
> > differences.  And maybe walk through more carefully with gdb.  But I'm out
> > of time tonight.
> > 
> > Ross
> 
> some more testing. i ran:
> 
>  ./src/tests/eina/eina_suite | wc -l
> 
> and... guess what... eina_suite has gone but wc is still there waiting. this
> is far deeper... is there some cgroup, selinux or something thing getting in
> the way? is it a kernel bug? a glibc bug? i just have to say.. i the efl test
> process is gone - and wc is still waiting the problem is somewhere in the
> plumbing between these IMHO... at least that's what my brain is thinking
> right now. pstree:
> 
>         │         │         │                 │
> ├─terminology─┬─zsh───wc │         │         │
> │               │             └─3*[{terminology}]
> 
> :(

and an update... strace'd eina suite + wc and.. well. eina_suite:

...
close(3)                                = 0
close(4)                                = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=55000, si_uid=1000,
si_status=SIGKILL, si_utime=0, si_stime=0} ---
exit_group(0)                           = ? +++ exited with 0 +++

so it exited... with a nice 0 exit code but wc is:
...
read(0, "100%: Checks: 7, Failures: 0, Er"..., 16384) = 40
read(0, "100%: Checks: 3, Failures: 0, Er"..., 16384) = 40
read(0, "100%: Checks: 10, Failures: 0, E"..., 16384) = 41
read(0,

it's sitting on a blocking read ... obviously on the fd that was tied to the
pipe to eina suite... and that read doesn't complete. it should at this point
return ... but doesn't.

in the meantime i'll commit my extended 204sec timeout changes as well as some
more error checking of fork and execl - in this case it's not these as the
execl for the timeout binary does work (it's optimistically assumed to always
work and never fail... which is not our problem here, and fork is assumed to
always work and never fail - i added some return checks there but again this is
not the issue...

so what i have found is... eina_suite exits... but the timeout binary does not
- it's still sleeping sitting around and instead of it terminating with the
parent. it SHOULD terminate with the parent no matter what... but doesn't

and well eina suite is killing timeout - but its the /bin/sh parent of timeout
(execl uses /bin/sh to run timeout). this doesn't take timeout down with it
though... i would have thought a sigpipe from the parent /bin/sh should have
done this... as timeout was not detached from the shell with a &... so this has
something to do with /bin/sh ... some change? so the timeout process stays
around keeping the whole parent shell + pipe alive

committing some fixes to work around this /bin/sh oddity along with the above
fork+execl return checks to be less optimisitic.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
Carsten Haitzler - ras...@rasterman.com



_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to