*Synopsis*: *ksh93* signal handling is broken CR 6782948 changed on Jan 25 2011 by <User 1-I30-737>
=== Field ============ === New Value ============= === Old Value ============= See Also 7008357 ====================== =========================== =========================== *Change Request ID*: 6782948 *Synopsis*: *ksh93* signal handling is broken Product: solaris Category: shell Subcategory: korn93 Type: Defect Subtype: Status: 3-Accepted Substatus: Priority: 2-High Introduced In Release: solaris_nevada Introduced In Build: Responsible Engineer: Keywords: === *Description* ============================================================ i'm currently running opensolaris snv_99. i noticed some of my shell script hanging when running in ksh93. the problem seems to be centered around signal trap and child handling within ksh93. i managed to boil my scripts down to an easily reproducible test case. you can reproduce the problem by running the following script: ---8<--- #!/bin/ksh foo() { exit 0; } trap foo EXIT /bin/yes | while read yes; do (/bin/date) done ---8<--- then in another window run the following command: ---8<--- while :; do kill -WINCH <script pid>; done ---8<--- this will quickly result in a failure. i've seen the shell process emit error messages, hard hang (requiring a kill -9), and/or core dump. here's the stack trace from my originaly hung shell script: ---8<--- <email address omitted>$ pstack 15252 15252: /bin/ksh -p /home/edp/work/bin/explorer_extract ff2c5910 waitid (7, 0, ffbfdad8, f) 0002993c job_wait (3ba2, 57400, 57800, 0, 1, f0) + 1b0 00038090 sh_exec (0, 57800, 0, 57800, 57800, 0) + 2198 000371fc sh_exec (8000, 58e538, 4, 57400, ffbfe0ac, 1) + 1304 0003722c sh_exec (58e0e0, 4, 9004, 3, 8004, 58e280) + 1334 000324cc sh_funct (58e0e0, 56b56c, 4, 0, 0, 56b120) + 144 00036450 sh_exec (56b120, 4, 509208, 4e97f8, 509208, 57800) + 558 0003722c sh_exec (56b460, 4, 57400, 3, 57400, 56b410) + 1334 00037708 sh_exec (56acd8, 57800, 0, 57800, 57800, 57800) + 1810 000371fc sh_exec (8000, 56b480, 14, 57400, ffbff154, 1) + 1304 0002d87c exfile (80000, 57800, 57800, 100000, 57800, 57800) + 73c 0002d110 main (2f400, 57400, 4fd948, 57400, 57800, 57800) + a1c 00019450 _start (0, 0, 0, 0, 0, 0) + 108 ---8<--- in the cases when ksh93 core dumps, the stack traces vary, but the thing that they all have in common is a call to _ast_[cm]alloc(), which seems to indicate some kind of memory corruption. here's one example stack trace, that shows the memory allocation function being called from a signal handler. (normally memory allocation within signal handlers immediatly indicates an application bug since the libc memory allocation interfaces are not async-signal safe, but ksh93 seems to provide it's own memory allocation interfaces, and i don't know if they are async-signal safe): ---8<--- core 'core.ksh93.348315.89769' of 348315: /bin/ksh ./bug.sh fffffd7ffefc18db bestsearch () + 1ab fffffd7ffefc22a3 bestalloc () + 263 fffffd7ffefc102a _ast_malloc () + 9a fffffd7fff0dc7b3 nv_putval () + 8b3 fffffd7fff0c1468 sh_fault () + 88 fffffd7fff2750b6 __sighndlr () + 6 fffffd7fff2698ef call_user_handler () + 2ff fffffd7fff269af9 sigacthandler (14, 0, fffffd7fffdfae30) + c9 --- called from signal handler with signal 20 (SIGWINCH) --- fffffd7ffef3404a dthash () + 7a fffffd7fff0e0b9b nv_search () + 6b fffffd7fff0e9931 path_spawn () + 91 fffffd7fff0f98ed sh_ntfork () + 3bd fffffd7fff0f7127 sh_exec () + 2ff7 fffffd7fff0f03d6 sh_subshell () + 596 fffffd7fff0f5dfb sh_exec () + 1ccb fffffd7fff0f536c sh_exec () + 123c fffffd7fff0f7589 sh_exec () + 3459 fffffd7fff0f5c44 sh_exec () + 1b14 fffffd7fff0d98e6 exfile () + 766 fffffd7fff0d910b sh_main () + 7ab 0000000000400db1 main () + 21 0000000000400c3c ???????? () ---8<--- *** (#1 of 2): 2008-12-09 20:56:00 GMT+00:00 <User 1-5Q-4162> i still see this problem with snv_106, although it's harder to reproduce than before. here's how the core dump looks now: ---8<--- <email address omitted>$ pstack core core 'core' of 136269: /bin/ksh ./fail2.ksh fffffd7ffeffb665 bestsearch () + 1a5 fffffd7ffeffb8c8 bestreclaim () + 1f8 fffffd7ffeffc0ea bestalloc () + 2ca fffffd7ffeffad84 _ast_malloc () + 9c fffffd7fff13b19d nv_putval () + 525 fffffd7fff0f7091 sh_readline () + 1099 fffffd7fff0f5d0a b_read () + 542 fffffd7fff15ea23 sh_exec () + 2deb fffffd7fff15cfee sh_exec () + 13b6 fffffd7fff15f78c sh_exec () + 3b54 fffffd7fff15d9c8 sh_exec () + 1d90 fffffd7fff136e86 exfile () + 786 fffffd7fff136676 sh_main () + 7fe 0000000000400e72 main () + 52 0000000000400ccc ???????? () ---8<--- another strange thing i noticed is that if you follow the steps above to reproduce the problem, the output from the date command changes once you start sending signals to ksh93. here's how it looks: ---8<--- Tuesday, February 3, 2009 12:46:28 PM PST Tuesday, February 3, 2009 12:46:30 PM PST Tue Feb 3 12:46:30 PST 2009 Tue Feb 3 12:46:37 PST 2009 ---8<--- *** (#2 of 2): 2009-02-03 20:47:37 GMT+00:00 <User 1-5Q-4162> === *Public Comments* ======================================================== === *Workaround* ============================================================= === *Additional Details* ===================================================== Targeted Release: Commit To Fix In Build: Fixed In Build: Integrated In Build: Verified In Build: See Also: 6876768, 7008357 Duplicate of: Hooks: Hook1: Hook2: Hook3: Hook4: Hook5: Hook6: Program Management: Root Cause: Fix Affects Documentation: No Fix Affects Localization: No === *History* ================================================================ Date Submitted: 2008-12-09 20:56:00 GMT+00:00 Submitted By: <User 1-5Q-4162> Status Changed Date Updated Updated By 3-Accepted 2009-01-17 00:11:42 GMT+00:00 <User 1-5Q-5151> 2-Incomplete 2009-08-03 15:03:50 GMT+00:00 <User 1-1SURPB> 3-Accepted 2009-08-03 15:14:16 GMT+00:00 <User 1-1SURPB> === *Service Request* ======================================================== Impact: Significant Functionality: Secondary Severity: 3 Product Name: solaris Product Release: solaris_nevada Product Build: Operating System: snv_99 Hardware: generic Submitted Date: 2008-12-09 20:56:00 GMT+00:00 === *Multiple Release (MR) Cluster* - 0 ====================================== _______________________________________________ ksh93-integration-discuss mailing list ksh93-integration-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/ksh93-integration-discuss