*Synopsis*: ksh93 hangs in situations that ksh handles okay

CR 6631006 changed on Nov 23 2009 by <User 1-2S67RN>

=== Field ============ === New Value ============= === Old Value =============

Integrated in Build    snv_128                                                
Status                 10-Fix Delivered            8-Fix Available            
====================== =========================== ===========================

     
*Change Request ID*: 6631006

*Synopsis*: ksh93 hangs in situations that ksh handles okay

  Product: solaris
  Category: shell
  Subcategory: korn93
  Type: Defect
  Subtype: 
  Status: 10-Fix Delivered
  Substatus: 
  Priority: 2-High
  Introduced In Release: solaris_nevada
  Introduced In Build: snv_72
  Responsible Engineer: <User 1-7MTUEB>
  Keywords: oss-request, oss-sponsor

=== *Description* ============================================================
This morning some of the elements in my $PATH were inaccessible due to
an offline NFS server.  When I logged in, my GNOME terminal window
didn't give me a shell prompt.  When I entered a ^C, the window went
away.

I then logged in as root and hid /usr/bin/ksh93, so that my login
scripts would use /usr/bin/ksh instead of /usr/bin/ksh93.  I then
logged in as myself, and my GNOME terminal window window gave me the
expected prompt.

I then tried running ksh93 by hand; it was indeed stuck trying to
access one of the inaccessible directories:

    athyra$ truss -p 8666
    stat("/ws/onnv-tools/onbld/bin", 0xFFFFFFFF7FFFE588) (sleeping...)

This appears to be hard to recover from, since one usually needs a
functional shell before one can change one's shell.

ksh93 needs to be at least as robust as the Solaris ksh in
circumstances like this before it can replace the Solaris ksh.

And there's some question in my mind whether ksh93 should be the
default root shell if it hangs in situations like this.  (Though I
suppose it's questionable practice for root to have NFS directories in
its PATH.  So maybe this isn't a critical issue.)

*** (#1 of 2): 2007-11-16 18:04:11 GMT+00:00 <User 1-5Q-12482>

[dep, 15Apr2009]

  This is especially bad considering ksh93 is installed as /bin/sh,
  which means every system(3C) call will hang on startup regardless of
  its dependence on PATH resolution beyond known local entries (usually 
  first in one's path for this reason).

*** (#2 of 2): 2009-04-16 00:44:28 GMT+00:00 <User 1-5Q-4224>


=== *Public Comments* ========================================================
Are you able to reproduce it with build 111?

*** (#1 of 6): 2009-04-16 07:32:20 GMT+00:00 <User 1-1SURPB>

[dep, 16Apr2009]

  ksh93 appears to have the same behavior on build 112.

*** (#2 of 6): 2009-04-16 20:14:59 GMT+00:00 <User 1-5Q-4224>

[dep, 13Aug2009]

  (In response to an unnecessarily non-public comment claiming this has
  something to do with fancy stuck filesystem detection in Sun's ksh88,
  and that somehow caching file descriptors and using openat will
  magically solve the problem.)

  There is *NOT* a matter of Sun's ksh detecting stuck filesystems.
  This is a matter of ksh93 scanning your entire path on startup,
  whereas Sun's ksh (and more importantly, sh) simply did not.
  Period.

  My PATH:

    ; echo $PATH
    
/home/dep/private/bin:/home/dep/bin/i386:/home/dep/bin:/usr/bin:/usr/sbin:/usr/openwin/bin:/usr/sfw/bin:/ws/onnv-tools/SUNWspro/SS11/bin:/ws/onnv-tools/SUNWspro/SOS8/bin:/ws/onnv-tools/onbld/bin:/ws/onnv-tools/onbld/bin/i386:/usr/ccs/bin:/usr/java/bin

  Eliminate effect of dot files:

    ; mkdir /tmp/foo
    ; HOME=/tmp/foo

  stats and opens from ksh (or /usr/xpg4/bin/sh):

    ; truss -t stat,open ksh
    stat64("/usr/bin/ksh", 0x08047608)              = 0
    open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
    stat64("/lib/libc.so.1", 0x08046E08)            = 0
    open("/lib/libc.so.1", O_RDONLY)                = 3
    stat64("/home/dep", 0x080476F0)                 = 0
    stat64(".", 0x08047780)                         = 0
    stat64("/home/dep", 0x08047720)                 = 0
    stat64(".", 0x080477B0)                         = 0
    stat64("/home/dep", 0x08047720)                 = 0
    stat64(".", 0x080477B0)                         = 0
    open64("", O_RDWR|O_APPEND|O_CREAT, 0600)       Err#2 ENOENT
    open64("/tmp/sh827332.1", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
    $

  stats and opens from sh:

    ; truss -t stat,open sh
    stat64("/sbin/sh", 0x08047610)                  = 0
    open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
    stat64("/lib/libc.so.1", 0x08046E10)            = 0
    open("/lib/libc.so.1", O_RDONLY)                = 3
    $

  stats and opens from ksh93:

    ; truss -t stat,open ksh93
    stat64("/usr/bin/ksh93", 0x08047604)            = 0
    open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
    stat64("/lib/libc.so.1", 0x08046E04)            = 0
    open("/lib/libc.so.1", O_RDONLY)                = 3
    open("/proc/self/auxv", O_RDONLY)               = 3
    stat("/usr/bin/amd64/ksh93", 0xFFFFFD7FFFDFF550) = 0
    open("/var/ld/64/ld.config", O_RDONLY)          Err#2 ENOENT
    stat("/lib/64/libc.so.1", 0xFFFFFD7FFFDFE9F0)   = 0
    open("/lib/64/libc.so.1", O_RDONLY)             = 3
    stat("/lib/64/libshell.so.1", 0xFFFFFD7FFFDFEBC0) Err#2 ENOENT
    stat("/usr/lib/64/libshell.so.1", 0xFFFFFD7FFFDFEBC0) = 0
    open("/usr/lib/64/libshell.so.1", O_RDONLY)     = 3
    stat("/lib/64/libcmd.so.1", 0xFFFFFD7FFFDFE730) Err#2 ENOENT
    stat("/usr/lib/64/libcmd.so.1", 0xFFFFFD7FFFDFE730) = 0
    open("/usr/lib/64/libcmd.so.1", O_RDONLY)       = 3
    stat("/lib/64/libast.so.1", 0xFFFFFD7FFFDFE2A0) Err#2 ENOENT
    stat("/usr/lib/64/libast.so.1", 0xFFFFFD7FFFDFE2A0) = 0
    open("/usr/lib/64/libast.so.1", O_RDONLY)       = 3
    stat("/lib/64/libm.so.2", 0xFFFFFD7FFFDFE730)   = 0
    open("/lib/64/libm.so.2", O_RDONLY)             = 3
    stat("/dev/null", 0xFFFFFD7FFFDFF180)           = 0
    stat("/home/dep", 0xFFFFFD7FFFDFF110)           = 0
    stat(".", 0xFFFFFD7FFFDFF190)                   = 0
    stat("/home/dep/private/bin", 0xFFFFFD7FFFDFF4C0) = 0
    open("/home/dep/private/bin/.paths", O_RDONLY)  Err#2 ENOENT
    stat("/home/dep/bin/i386", 0xFFFFFD7FFFDFF4C0)  = 0
    open("/home/dep/bin/i386/.paths", O_RDONLY)     Err#2 ENOENT
    stat("/home/dep/bin", 0xFFFFFD7FFFDFF4C0)       = 0
    open("/home/dep/bin/.paths", O_RDONLY)          Err#2 ENOENT
    stat("/usr/bin", 0xFFFFFD7FFFDFF4C0)            = 0
    open("/usr/bin/.paths", O_RDONLY)               Err#2 ENOENT
    stat("/usr/sbin", 0xFFFFFD7FFFDFF4C0)           = 0
    open("/usr/sbin/.paths", O_RDONLY)              Err#2 ENOENT
    stat("/usr/openwin/bin", 0xFFFFFD7FFFDFF4C0)    = 0
    open("/usr/openwin/bin/.paths", O_RDONLY)       Err#2 ENOENT
    stat("/usr/sfw/bin", 0xFFFFFD7FFFDFF4C0)        = 0
    open("/usr/sfw/bin/.paths", O_RDONLY)           Err#2 ENOENT
    stat("/ws/onnv-tools/SUNWspro/SS11/bin", 0xFFFFFD7FFFDFF4C0) = 0
    open("/ws/onnv-tools/SUNWspro/SS11/bin/.paths", O_RDONLY) Err#2 ENOENT
    stat("/ws/onnv-tools/SUNWspro/SOS8/bin", 0xFFFFFD7FFFDFF4C0) = 0
    open("/ws/onnv-tools/SUNWspro/SOS8/bin/.paths", O_RDONLY) Err#2 ENOENT
    stat("/ws/onnv-tools/onbld/bin", 0xFFFFFD7FFFDFF4C0) = 0
    open("/ws/onnv-tools/onbld/bin/.paths", O_RDONLY) Err#2 ENOENT
    stat("/ws/onnv-tools/onbld/bin/i386", 0xFFFFFD7FFFDFF4C0) = 0
    open("/ws/onnv-tools/onbld/bin/i386/.paths", O_RDONLY) Err#2 ENOENT
    stat("/usr/ccs/bin", 0xFFFFFD7FFFDFF4C0)        = 0
    open("/usr/ccs/bin/.paths", O_RDONLY)           Err#2 ENOENT
    stat("/usr/java/bin", 0xFFFFFD7FFFDFF4C0)       = 0
    open("/usr/java/bin/.paths", O_RDONLY)          Err#2 ENOENT
    open("/etc/ksh.kshrc", O_RDONLY)                = 3
    open("/tmp/foo/.kshrc", O_RDONLY)               Err#2 ENOENT
    open("", O_RDWR|O_APPEND|O_CREAT, 0600)         Err#2 ENOENT
    open("/tmp/astv6s.919", O_RDWR|O_APPEND|O_CREAT, 0600) = 3
        Received signal #18, SIGCLD, in waitid() [caught]
          siginfo: SIGCLD CLD_EXITED pid=827335 status=0x0000
    <email address omitted>:/home/dep$

  As you can see, even though nothing actually made use of my PATH,
  ksh93 performed a stat and open for each PATH element.  This
  preliminary scan of the path is costly and unnecessary, and makes
  ksh93 unusable in many situations.

  Even bash doesn't do this (it searches PATH to find itself, but only
  uses as much as it needs):

    ; truss -t open,stat bash
    stat64("/usr/bin/bash", 0x08047608)             = 0
    open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
    stat64("/lib/libcurses.so.1", 0x08046E08)       = 0
    open("/lib/libcurses.so.1", O_RDONLY)           = 3
    stat64("/lib/libsocket.so.1", 0x08046E08)       = 0
    open("/lib/libsocket.so.1", O_RDONLY)           = 3
    stat64("/lib/libnsl.so.1", 0x08046E08)          = 0
    open("/lib/libnsl.so.1", O_RDONLY)              = 3
    stat64("/lib/libdl.so.1", 0x08046E08)           = 0
    open("/lib/libdl.so.1", O_RDONLY)               = 3
    stat64("/lib/libc.so.1", 0x08046E08)            = 0
    open("/lib/libc.so.1", O_RDONLY)                = 3
    open64("/dev/tty", O_RDWR|O_NONBLOCK)           = 3
    stat64("/dev/pts/0", 0x08047810)                = 0
    open64("/var/run/name_service_door", O_RDONLY)  = 3
    stat64("/home/dep", 0x08047690)                 = 0
    stat64(".", 0x08047720)                         = 0
    stat64(".", 0x080476C0)                         = 0
    stat64("/home/dep/private/bin/bash", 0x080475C0) Err#2 ENOENT
    stat64("/home/dep/bin/i386/bash", 0x080475C0)   Err#2 ENOENT
    stat64("/home/dep/bin/bash", 0x080475C0)        Err#2 ENOENT
    stat64("/usr/bin/bash", 0x080475C0)             = 0
    stat64("/usr/bin/bash", 0x080475E0)             = 0
    open64("/tmp/foo/.bashrc", O_RDONLY)            Err#2 ENOENT
    open64("", O_RDONLY)                            Err#2 ENOENT
    open("/home/dep/.terminfo/x/xterm", O_RDONLY)   Err#2 ENOENT
    open("/usr/share/lib/terminfo//x/xterm", O_RDONLY) = 4
        Received signal #20, SIGWINCH [caught]
    stat64("/tmp/foo/.inputrc", 0x08046ED0)         Err#2 ENOENT
    stat64("/etc/inputrc", 0x08046ED0)              Err#2 ENOENT
        Received signal #20, SIGWINCH [caught]
    bash-3.2$

  Moreover, none of zsh, csh, nor tcsh scan the PATH on startup.  This
  is a ksh93-only phenomenon.

*** (#3 of 6): 2009-08-13 21:07:10 GMT+00:00 <User 1-5Q-4224>

Update from Roland:


1. the original ksh88i build from the AT&T sources behaves the same way as 
ksh93 version 's' and scans the PATH at startup. That's why I _guessed_ that 
someone has modified Solaris's ksh88 to behave differently (as a side-effect 
neither Solaris ksh88 or the derived /usr/xpg4/bin/sh conform to POSIX/SUS if 
they no longer check for this (see [2])).


2. The POSIX/SUS standard _requires_ that shells scan all elements of PATH when 
they try to find a command. This even happens for builtin commands when they 
are bound to a specific PATH since such bound builtins are only allowed to be 
executed if there is a matching file in the filesystem


3. the results of the PATH scan are allowed to be cached. That's why we're 
going to switch to |openat()| the directories in PATH at the time when PATH is 
set/changed for one of the next ksh93 versions (but first we need to complete 
ksh93-integration update2) - if there is a way to detect stuck NFS filesystems 
we're going to add the matching code with that version

*** (#4 of 6): 2009-09-17 10:54:08 GMT+00:00 <User 1-5Q-6276>

POSIX/SUS says (definition of PATH) that:

"The list shall be searched from beginning to end, applying the filename
to each prefix, until an executable file with the specified name and
appropriate execution permissions is found."

So, the shell doesn't need to scan all elements of PATH.

*** (#5 of 6): 2009-09-17 18:34:44 GMT+00:00 <User 1-5Q-4028>

Copying the evaluation to public comments here, so Roland can read it.
====
There are two scenarios where shell can hang when NFS path is present in PATH 
variable.

*) When ksh93 is invoked it does stat on all directories which is present in 
PATH variable and it tries to open .paths file. If NFS directory is present in 
PATH which is not reachable then ksh93 shell hangs. 

1  86632                       open:entry   /usr/openwin/bin/.paths
              libc.so.1`__open_syscall+0xa
              libc.so.1`open+0x137
              libshell.so.1`path_chkpaths+0xcc
              libshell.so.1`path_addcomp+0x3f2
              libshell.so.1`path_addpath+0xc1
              libshell.so.1`path_init+0x70
              libshell.so.1`path_opentype+0x51
              libshell.so.1`path_open+0xb
              libshell.so.1`sh_source+0x30
              libshell.so.1`sh_main+0x43f
              ksh93`main+0x52
              ksh93`0x400ccc

*) When a command is being executed under ksh93. It does a stat on file in all 
the directories under PATH. Which can also can cause hang if NFS fileserver is 
not reachable.

# dtrace -n 'syscall::*stat*:entry /execname=="ksh93"/{ 
trace(copyinstr(arg0));}'
dtrace: description 'syscall::*stat*:entry ' matched 15 probes
CPU     ID                    FUNCTION:NAME
  1  86658                       stat:entry   /opt/SUNWspro/bin/ls
  1  86658                       stat:entry   /usr/X11R6/bin/ls
  1  86658                       stat:entry   /usr/dt/bin/ls
  1  86658                       stat:entry   /usr/local/bin/ls
  1  86658                       stat:entry   /usr/bin/ls
  1  86774                      lstat:entry   /usr/bin/ls

====

*** (#6 of 6): 2009-10-07 10:50:51 GMT+00:00 <User 1-5Q-5197>


=== *Workaround* =============================================================

=== *Additional Details* =====================================================
        Targeted Release: solaris_nevada
        Commit To Fix In Build: snv_128
        Fixed In Build: snv_128
        Integrated In Build: snv_128
        Verified In Build: 
  See Also: 6437624, 6793763
  Duplicate of: 
  Hooks:
        Hook1: 
        Hook2: 
        Hook3: 
        Hook4: 
        Hook5: <email address omitted>
        Hook6: <email address omitted>
  Program Management: Fix Integrated into Source
  Root Cause: Other - see Research Activity
  Fix Affects Documentation: No
  Fix Affects Localization: No

=== *History* ================================================================
        Date Submitted: 2007-11-16 18:04:11 GMT+00:00
        Submitted By: <User 1-5Q-12482>

        Status Changed    Date Updated                  Updated By
        3-Accepted        2008-08-20 22:57:39 GMT+00:00 <User 1-5Q-5151>
        2-Incomplete      2009-04-16 07:32:19 GMT+00:00 <User 1-1SURPB>
        3-Accepted        2009-04-16 20:14:59 GMT+00:00 <User 1-5Q-4224>
        5-Cause Known     2009-10-06 09:35:50 GMT+00:00 <User 1-GN0KC>
        7-Fix in Progress 2009-10-23 17:23:50 GMT+00:00 <User 1-7MTUEB>
        8-Fix Available   2009-10-28 18:23:28 GMT+00:00 <User 1-5HNZ8F>
        10-Fix Delivered  2009-11-23 05:16:40 GMT+00:00 <User 1-2S67RN>


=== *Service Request* ========================================================
        Impact: Significant
        Functionality: Secondary
        Severity: 3
        Product Name: solaris
        Product Release: solaris_nevada
        Product Build: 
        Operating System: snv_77
        Hardware: ultrasparc
        Submitted Date: 2007-11-16 18:04:11 GMT+00:00


=== *Service Request* ========================================================
        Impact: Significant
        Functionality: Primary
        Severity: 2
        Product Name: solaris
        Product Release: solaris_nevada
        Product Build: snv_110
        Operating System: snv_110
        Hardware: generic
        Submitted Date: 2009-04-16 00:44:28 GMT+00:00


=== *Service Request* ========================================================
        Impact: Critical
        Functionality: Primary
        Severity: 1
        Product Name: solaris
        Product Release: solaris_nevada
        Product Build: snv_122
        Operating System: snv_122
        Hardware: generic
        Submitted Date: 2009-09-15 17:34:12 GMT+00:00


=== *Multiple Release (MR) Cluster* - 0 ======================================

Reply via email to