On Tue, 30 Jul 2013 16:16:28 +0200 Roland Mainz wrote: > On Tue, Jul 30, 2013 at 4:12 PM, Glenn Fowler <[email protected]> wrote: > > > > On Tue, 30 Jul 2013 15:40:48 +0200 Roland Mainz wrote: > >> --047d7b6d8d7cb1934204e2bac182 > >> Content-Type: text/plain; charset=ISO-8859-1 > > > >> On Mon, Jul 29, 2013 at 3:34 PM, Roland Mainz <[email protected]> > >> wrote: > >> > On Sat, Jul 27, 2013 at 6:17 PM, Glenn Fowler <[email protected]> > >> > wrote: > >> [snip] > >> > I hit (again) one of the nasty and hard to reproduce problems (which > >> > means: No... I don't have a testcase... ;-( ) related to job > >> > management: > >> > -- snip -- > >> [snip] > >> > -- snip -- > >> > > >> > ... this hang occured because ksh93 tries to do job management from > >> > within the signal handler... which is IMO _far_ from being safe to > >> > do... > >> > > >> > David: Is there *ANY* way the job management can be moved out of the > >> > signal handler and just rely on the siginfo data ? > > > >> Attached (as > >> "astksh20130727_prototype_process_sigchld_outside_signalhandler001.diff.txt") > >> is a prototype (hacked) patch for discussion. > > > >> * The good news are: > >> - SIGCHLD processing now seems to be reliable. I can't find a simple > >> testcase to break it (but see the item about the testsuite below) > >> - SIGCHLD processing now works without problems under "valgrind" control > >> - The patch seems to fix a lot of Cedric's issues with signals being > >> lost. Even the testcase where a $ ... while ! wait ; sleep 8 ; done # > >> was loosing signals now works... if I replace the "sleep 8" with > >> "compound -a dummy ; poll -t 8 dummy" (so there seems to be a seperate > >> problem with "sleep" ... somehow) > > > >> * The bad news are: > >> - The patch doesn't work very well for interactive shells > >> - The "sigchld.sh" testsuite module fails in several places... AFAIK > >> the reason is that we don't check the list of queued CHLD signals in > >> all places where they should be checked (e.g. each time before we > >> execute a shell statement and before and after running external > >> processes and each time wait(1) internally wakes up from a signal) > >> - $ jobs -l # reports already purged processes... erm... no clue why... ;-( > > > >> * Notes about interactive mode: > >> An interactive shell AFAIK requires to report terminating childs > >> "immediately" if some shell options are set (erm... David: Which > >> option is this again ?). AFAIK this can be archived by two ways: > >> 1. wake up from time-to-time (e.g. polling) ... but this has a bad > >> effect in CPU usage (imagine 1000 shell sessions in a multi-user > >> system waking-up 10 times per second... or imagine the impact for > >> battery life in a laptop) > >> 2. add some glue to the editor code to wake up for each signald and > >> process it when |sfpkrd()| returns... > >> -- snip -- > >> (gdb) where > >> #0 0x00007fc2be588860 in __poll_nocancel () at > >> ../sysdeps/unix/syscall-template.S:81 > >> #1 0x000000000050280d in sfpkrd (fd=0, argbuf=0x7fff33545080, n=80, > >> rc=0, tm=-1, action=1) > >> at > >> /home/test001/work/ast_ksh_20130727/build_i386_64bit_debug_patched/src/lib/libast/sfio/sfpkrd.c:142 > >> #2 0x0000000000410fb9 in ed_read (context=0x7fc2bf063960, fd=0, > >> buff=0x7fff33545080 "\240PT3\377\177", size=80, reedit=0) > >> at > >> /home/test001/work/ast_ksh_20130727/build_i386_64bit_debug_patched/src/cmd/ksh93/edit/edit.c:884 > >> -- snip -- > >> ... that's the strategy my patch attempts... > > > > for the current code do you see failures when the vmalloc debug method is > > not used?
> Sporadically... yes. But I'm not sure whether this is an issue in > vmalloc or something else. I'm not even sure whether the job lkist > handling code in ksh93 is the root cause of most of the trouble. This > is all made much harder because Linux doesn't have much usefull tools > to instrument signal handlers so I have to walk the code by hand. it might be good to nail down this problem first otherwise we might be applying fixes for something that's not broken I just yesterday plugged a few EINTR holes in src/lib/libast/vmalloc/vmdcsystem.c that may be part of what you saw _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
