*Synopsis*: *ksh93* signal handling is broken

CR 6782948 changed on Jan 25 2011 by <User 1-I30-737>

=== Field ============ === New Value ============= === Old Value =============

See Also               7008357                                                
====================== =========================== ===========================

     
*Change Request ID*: 6782948

*Synopsis*: *ksh93* signal handling is broken

  Product: solaris
  Category: shell
  Subcategory: korn93
  Type: Defect
  Subtype: 
  Status: 3-Accepted
  Substatus: 
  Priority: 2-High
  Introduced In Release: solaris_nevada
  Introduced In Build: 
  Responsible Engineer: 
  Keywords: 

=== *Description* ============================================================
i'm currently running opensolaris snv_99.
i noticed some of my shell script hanging when running in ksh93.
the problem seems to be centered around signal trap and child handling
within ksh93.  i managed to boil my scripts down to an easily reproducible
test case.

you can reproduce the problem by running the following script:
---8<---
#!/bin/ksh

foo() { exit 0; }
trap foo EXIT

/bin/yes | while read yes; do
        (/bin/date)
done

---8<---

then in another window run the following command:
---8<---
while :; do kill -WINCH <script pid>; done
---8<---

this will quickly result in a failure.  i've seen the shell
process emit error messages, hard hang (requiring a kill -9),
and/or core dump.

here's the stack trace from my originaly hung shell script:
---8<---
<email address omitted>$ pstack 15252
15252:  /bin/ksh -p /home/edp/work/bin/explorer_extract
 ff2c5910 waitid   (7, 0, ffbfdad8, f)
 0002993c job_wait (3ba2, 57400, 57800, 0, 1, f0) + 1b0
 00038090 sh_exec  (0, 57800, 0, 57800, 57800, 0) + 2198
 000371fc sh_exec  (8000, 58e538, 4, 57400, ffbfe0ac, 1) + 1304
 0003722c sh_exec  (58e0e0, 4, 9004, 3, 8004, 58e280) + 1334
 000324cc sh_funct (58e0e0, 56b56c, 4, 0, 0, 56b120) + 144
 00036450 sh_exec  (56b120, 4, 509208, 4e97f8, 509208, 57800) + 558
 0003722c sh_exec  (56b460, 4, 57400, 3, 57400, 56b410) + 1334
 00037708 sh_exec  (56acd8, 57800, 0, 57800, 57800, 57800) + 1810
 000371fc sh_exec  (8000, 56b480, 14, 57400, ffbff154, 1) + 1304
 0002d87c exfile   (80000, 57800, 57800, 100000, 57800, 57800) + 73c
 0002d110 main     (2f400, 57400, 4fd948, 57400, 57800, 57800) + a1c
 00019450 _start   (0, 0, 0, 0, 0, 0) + 108
---8<---

in the cases when ksh93 core dumps, the stack traces vary, but the
thing that they all have in common is a call to _ast_[cm]alloc(),
which seems to indicate some kind of memory corruption.  here's one
example stack trace, that shows the memory allocation function
being called from a signal handler.  (normally memory allocation
within signal handlers immediatly indicates an application bug
since the libc memory allocation interfaces are not async-signal
safe, but ksh93 seems to provide it's own memory allocation
interfaces, and i don't know if they are async-signal safe):
---8<---
core 'core.ksh93.348315.89769' of 348315:       /bin/ksh ./bug.sh
 fffffd7ffefc18db bestsearch () + 1ab
 fffffd7ffefc22a3 bestalloc () + 263
 fffffd7ffefc102a _ast_malloc () + 9a
 fffffd7fff0dc7b3 nv_putval () + 8b3
 fffffd7fff0c1468 sh_fault () + 88
 fffffd7fff2750b6 __sighndlr () + 6
 fffffd7fff2698ef call_user_handler () + 2ff
 fffffd7fff269af9 sigacthandler (14, 0, fffffd7fffdfae30) + c9
 --- called from signal handler with signal 20 (SIGWINCH) ---
 fffffd7ffef3404a dthash () + 7a
 fffffd7fff0e0b9b nv_search () + 6b
 fffffd7fff0e9931 path_spawn () + 91
 fffffd7fff0f98ed sh_ntfork () + 3bd
 fffffd7fff0f7127 sh_exec () + 2ff7
 fffffd7fff0f03d6 sh_subshell () + 596
 fffffd7fff0f5dfb sh_exec () + 1ccb
 fffffd7fff0f536c sh_exec () + 123c
 fffffd7fff0f7589 sh_exec () + 3459
 fffffd7fff0f5c44 sh_exec () + 1b14
 fffffd7fff0d98e6 exfile () + 766
 fffffd7fff0d910b sh_main () + 7ab
 0000000000400db1 main () + 21
 0000000000400c3c ???????? ()
---8<---

*** (#1 of 2): 2008-12-09 20:56:00 GMT+00:00 <User 1-5Q-4162>

i still see this problem with snv_106, although it's harder to reproduce
than before.  here's how the core dump looks now:
---8<---
<email address omitted>$ pstack core
core 'core' of 136269:  /bin/ksh ./fail2.ksh
 fffffd7ffeffb665 bestsearch () + 1a5
 fffffd7ffeffb8c8 bestreclaim () + 1f8
 fffffd7ffeffc0ea bestalloc () + 2ca
 fffffd7ffeffad84 _ast_malloc () + 9c
 fffffd7fff13b19d nv_putval () + 525
 fffffd7fff0f7091 sh_readline () + 1099
 fffffd7fff0f5d0a b_read () + 542
 fffffd7fff15ea23 sh_exec () + 2deb
 fffffd7fff15cfee sh_exec () + 13b6
 fffffd7fff15f78c sh_exec () + 3b54
 fffffd7fff15d9c8 sh_exec () + 1d90
 fffffd7fff136e86 exfile () + 786
 fffffd7fff136676 sh_main () + 7fe
 0000000000400e72 main () + 52
 0000000000400ccc ???????? ()
---8<---

another strange thing i noticed is that if you follow the steps above to
reproduce the problem, the output from the date command changes once
you start sending signals to ksh93.  here's how it looks:
---8<---
Tuesday, February  3, 2009 12:46:28 PM PST
Tuesday, February  3, 2009 12:46:30 PM PST
Tue Feb  3 12:46:30 PST 2009
Tue Feb  3 12:46:37 PST 2009
---8<---

*** (#2 of 2): 2009-02-03 20:47:37 GMT+00:00 <User 1-5Q-4162>


=== *Public Comments* ========================================================

=== *Workaround* =============================================================

=== *Additional Details* =====================================================
        Targeted Release: 
        Commit To Fix In Build: 
        Fixed In Build: 
        Integrated In Build: 
        Verified In Build: 
  See Also: 6876768, 7008357
  Duplicate of: 
  Hooks:
        Hook1: 
        Hook2: 
        Hook3: 
        Hook4: 
        Hook5: 
        Hook6: 
  Program Management: 
  Root Cause: 
  Fix Affects Documentation: No
  Fix Affects Localization: No

=== *History* ================================================================
        Date Submitted: 2008-12-09 20:56:00 GMT+00:00
        Submitted By: <User 1-5Q-4162>

        Status Changed    Date Updated                  Updated By
        3-Accepted        2009-01-17 00:11:42 GMT+00:00 <User 1-5Q-5151>
        2-Incomplete      2009-08-03 15:03:50 GMT+00:00 <User 1-1SURPB>
        3-Accepted        2009-08-03 15:14:16 GMT+00:00 <User 1-1SURPB>


=== *Service Request* ========================================================
        Impact: Significant
        Functionality: Secondary
        Severity: 3
        Product Name: solaris
        Product Release: solaris_nevada
        Product Build: 
        Operating System: snv_99
        Hardware: generic
        Submitted Date: 2008-12-09 20:56:00 GMT+00:00


=== *Multiple Release (MR) Cluster* - 0 ======================================

_______________________________________________
ksh93-integration-discuss mailing list
ksh93-integration-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/ksh93-integration-discuss

Reply via email to