*Synopsis*: ksh93 hangs in situations that ksh handles okay CR 6631006 changed on Nov 23 2009 by <User 1-2S67RN>
=== Field ============ === New Value ============= === Old Value ============= Integrated in Build snv_128 Status 10-Fix Delivered 8-Fix Available ====================== =========================== =========================== *Change Request ID*: 6631006 *Synopsis*: ksh93 hangs in situations that ksh handles okay Product: solaris Category: shell Subcategory: korn93 Type: Defect Subtype: Status: 10-Fix Delivered Substatus: Priority: 2-High Introduced In Release: solaris_nevada Introduced In Build: snv_72 Responsible Engineer: <User 1-7MTUEB> Keywords: oss-request, oss-sponsor === *Description* ============================================================ This morning some of the elements in my $PATH were inaccessible due to an offline NFS server. When I logged in, my GNOME terminal window didn't give me a shell prompt. When I entered a ^C, the window went away. I then logged in as root and hid /usr/bin/ksh93, so that my login scripts would use /usr/bin/ksh instead of /usr/bin/ksh93. I then logged in as myself, and my GNOME terminal window window gave me the expected prompt. I then tried running ksh93 by hand; it was indeed stuck trying to access one of the inaccessible directories: athyra$ truss -p 8666 stat("/ws/onnv-tools/onbld/bin", 0xFFFFFFFF7FFFE588) (sleeping...) This appears to be hard to recover from, since one usually needs a functional shell before one can change one's shell. ksh93 needs to be at least as robust as the Solaris ksh in circumstances like this before it can replace the Solaris ksh. And there's some question in my mind whether ksh93 should be the default root shell if it hangs in situations like this. (Though I suppose it's questionable practice for root to have NFS directories in its PATH. So maybe this isn't a critical issue.) *** (#1 of 2): 2007-11-16 18:04:11 GMT+00:00 <User 1-5Q-12482> [dep, 15Apr2009] This is especially bad considering ksh93 is installed as /bin/sh, which means every system(3C) call will hang on startup regardless of its dependence on PATH resolution beyond known local entries (usually first in one's path for this reason). *** (#2 of 2): 2009-04-16 00:44:28 GMT+00:00 <User 1-5Q-4224> === *Public Comments* ======================================================== Are you able to reproduce it with build 111? *** (#1 of 6): 2009-04-16 07:32:20 GMT+00:00 <User 1-1SURPB> [dep, 16Apr2009] ksh93 appears to have the same behavior on build 112. *** (#2 of 6): 2009-04-16 20:14:59 GMT+00:00 <User 1-5Q-4224> [dep, 13Aug2009] (In response to an unnecessarily non-public comment claiming this has something to do with fancy stuck filesystem detection in Sun's ksh88, and that somehow caching file descriptors and using openat will magically solve the problem.) There is *NOT* a matter of Sun's ksh detecting stuck filesystems. This is a matter of ksh93 scanning your entire path on startup, whereas Sun's ksh (and more importantly, sh) simply did not. Period. My PATH: ; echo $PATH /home/dep/private/bin:/home/dep/bin/i386:/home/dep/bin:/usr/bin:/usr/sbin:/usr/openwin/bin:/usr/sfw/bin:/ws/onnv-tools/SUNWspro/SS11/bin:/ws/onnv-tools/SUNWspro/SOS8/bin:/ws/onnv-tools/onbld/bin:/ws/onnv-tools/onbld/bin/i386:/usr/ccs/bin:/usr/java/bin Eliminate effect of dot files: ; mkdir /tmp/foo ; HOME=/tmp/foo stats and opens from ksh (or /usr/xpg4/bin/sh): ; truss -t stat,open ksh stat64("/usr/bin/ksh", 0x08047608) = 0 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT stat64("/lib/libc.so.1", 0x08046E08) = 0 open("/lib/libc.so.1", O_RDONLY) = 3 stat64("/home/dep", 0x080476F0) = 0 stat64(".", 0x08047780) = 0 stat64("/home/dep", 0x08047720) = 0 stat64(".", 0x080477B0) = 0 stat64("/home/dep", 0x08047720) = 0 stat64(".", 0x080477B0) = 0 open64("", O_RDWR|O_APPEND|O_CREAT, 0600) Err#2 ENOENT open64("/tmp/sh827332.1", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 $ stats and opens from sh: ; truss -t stat,open sh stat64("/sbin/sh", 0x08047610) = 0 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT stat64("/lib/libc.so.1", 0x08046E10) = 0 open("/lib/libc.so.1", O_RDONLY) = 3 $ stats and opens from ksh93: ; truss -t stat,open ksh93 stat64("/usr/bin/ksh93", 0x08047604) = 0 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT stat64("/lib/libc.so.1", 0x08046E04) = 0 open("/lib/libc.so.1", O_RDONLY) = 3 open("/proc/self/auxv", O_RDONLY) = 3 stat("/usr/bin/amd64/ksh93", 0xFFFFFD7FFFDFF550) = 0 open("/var/ld/64/ld.config", O_RDONLY) Err#2 ENOENT stat("/lib/64/libc.so.1", 0xFFFFFD7FFFDFE9F0) = 0 open("/lib/64/libc.so.1", O_RDONLY) = 3 stat("/lib/64/libshell.so.1", 0xFFFFFD7FFFDFEBC0) Err#2 ENOENT stat("/usr/lib/64/libshell.so.1", 0xFFFFFD7FFFDFEBC0) = 0 open("/usr/lib/64/libshell.so.1", O_RDONLY) = 3 stat("/lib/64/libcmd.so.1", 0xFFFFFD7FFFDFE730) Err#2 ENOENT stat("/usr/lib/64/libcmd.so.1", 0xFFFFFD7FFFDFE730) = 0 open("/usr/lib/64/libcmd.so.1", O_RDONLY) = 3 stat("/lib/64/libast.so.1", 0xFFFFFD7FFFDFE2A0) Err#2 ENOENT stat("/usr/lib/64/libast.so.1", 0xFFFFFD7FFFDFE2A0) = 0 open("/usr/lib/64/libast.so.1", O_RDONLY) = 3 stat("/lib/64/libm.so.2", 0xFFFFFD7FFFDFE730) = 0 open("/lib/64/libm.so.2", O_RDONLY) = 3 stat("/dev/null", 0xFFFFFD7FFFDFF180) = 0 stat("/home/dep", 0xFFFFFD7FFFDFF110) = 0 stat(".", 0xFFFFFD7FFFDFF190) = 0 stat("/home/dep/private/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/home/dep/private/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/home/dep/bin/i386", 0xFFFFFD7FFFDFF4C0) = 0 open("/home/dep/bin/i386/.paths", O_RDONLY) Err#2 ENOENT stat("/home/dep/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/home/dep/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/usr/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/usr/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/usr/sbin", 0xFFFFFD7FFFDFF4C0) = 0 open("/usr/sbin/.paths", O_RDONLY) Err#2 ENOENT stat("/usr/openwin/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/usr/openwin/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/usr/sfw/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/usr/sfw/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/ws/onnv-tools/SUNWspro/SS11/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/ws/onnv-tools/SUNWspro/SS11/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/ws/onnv-tools/SUNWspro/SOS8/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/ws/onnv-tools/SUNWspro/SOS8/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/ws/onnv-tools/onbld/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/ws/onnv-tools/onbld/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/ws/onnv-tools/onbld/bin/i386", 0xFFFFFD7FFFDFF4C0) = 0 open("/ws/onnv-tools/onbld/bin/i386/.paths", O_RDONLY) Err#2 ENOENT stat("/usr/ccs/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/usr/ccs/bin/.paths", O_RDONLY) Err#2 ENOENT stat("/usr/java/bin", 0xFFFFFD7FFFDFF4C0) = 0 open("/usr/java/bin/.paths", O_RDONLY) Err#2 ENOENT open("/etc/ksh.kshrc", O_RDONLY) = 3 open("/tmp/foo/.kshrc", O_RDONLY) Err#2 ENOENT open("", O_RDWR|O_APPEND|O_CREAT, 0600) Err#2 ENOENT open("/tmp/astv6s.919", O_RDWR|O_APPEND|O_CREAT, 0600) = 3 Received signal #18, SIGCLD, in waitid() [caught] siginfo: SIGCLD CLD_EXITED pid=827335 status=0x0000 <email address omitted>:/home/dep$ As you can see, even though nothing actually made use of my PATH, ksh93 performed a stat and open for each PATH element. This preliminary scan of the path is costly and unnecessary, and makes ksh93 unusable in many situations. Even bash doesn't do this (it searches PATH to find itself, but only uses as much as it needs): ; truss -t open,stat bash stat64("/usr/bin/bash", 0x08047608) = 0 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT stat64("/lib/libcurses.so.1", 0x08046E08) = 0 open("/lib/libcurses.so.1", O_RDONLY) = 3 stat64("/lib/libsocket.so.1", 0x08046E08) = 0 open("/lib/libsocket.so.1", O_RDONLY) = 3 stat64("/lib/libnsl.so.1", 0x08046E08) = 0 open("/lib/libnsl.so.1", O_RDONLY) = 3 stat64("/lib/libdl.so.1", 0x08046E08) = 0 open("/lib/libdl.so.1", O_RDONLY) = 3 stat64("/lib/libc.so.1", 0x08046E08) = 0 open("/lib/libc.so.1", O_RDONLY) = 3 open64("/dev/tty", O_RDWR|O_NONBLOCK) = 3 stat64("/dev/pts/0", 0x08047810) = 0 open64("/var/run/name_service_door", O_RDONLY) = 3 stat64("/home/dep", 0x08047690) = 0 stat64(".", 0x08047720) = 0 stat64(".", 0x080476C0) = 0 stat64("/home/dep/private/bin/bash", 0x080475C0) Err#2 ENOENT stat64("/home/dep/bin/i386/bash", 0x080475C0) Err#2 ENOENT stat64("/home/dep/bin/bash", 0x080475C0) Err#2 ENOENT stat64("/usr/bin/bash", 0x080475C0) = 0 stat64("/usr/bin/bash", 0x080475E0) = 0 open64("/tmp/foo/.bashrc", O_RDONLY) Err#2 ENOENT open64("", O_RDONLY) Err#2 ENOENT open("/home/dep/.terminfo/x/xterm", O_RDONLY) Err#2 ENOENT open("/usr/share/lib/terminfo//x/xterm", O_RDONLY) = 4 Received signal #20, SIGWINCH [caught] stat64("/tmp/foo/.inputrc", 0x08046ED0) Err#2 ENOENT stat64("/etc/inputrc", 0x08046ED0) Err#2 ENOENT Received signal #20, SIGWINCH [caught] bash-3.2$ Moreover, none of zsh, csh, nor tcsh scan the PATH on startup. This is a ksh93-only phenomenon. *** (#3 of 6): 2009-08-13 21:07:10 GMT+00:00 <User 1-5Q-4224> Update from Roland: 1. the original ksh88i build from the AT&T sources behaves the same way as ksh93 version 's' and scans the PATH at startup. That's why I _guessed_ that someone has modified Solaris's ksh88 to behave differently (as a side-effect neither Solaris ksh88 or the derived /usr/xpg4/bin/sh conform to POSIX/SUS if they no longer check for this (see [2])). 2. The POSIX/SUS standard _requires_ that shells scan all elements of PATH when they try to find a command. This even happens for builtin commands when they are bound to a specific PATH since such bound builtins are only allowed to be executed if there is a matching file in the filesystem 3. the results of the PATH scan are allowed to be cached. That's why we're going to switch to |openat()| the directories in PATH at the time when PATH is set/changed for one of the next ksh93 versions (but first we need to complete ksh93-integration update2) - if there is a way to detect stuck NFS filesystems we're going to add the matching code with that version *** (#4 of 6): 2009-09-17 10:54:08 GMT+00:00 <User 1-5Q-6276> POSIX/SUS says (definition of PATH) that: "The list shall be searched from beginning to end, applying the filename to each prefix, until an executable file with the specified name and appropriate execution permissions is found." So, the shell doesn't need to scan all elements of PATH. *** (#5 of 6): 2009-09-17 18:34:44 GMT+00:00 <User 1-5Q-4028> Copying the evaluation to public comments here, so Roland can read it. ==== There are two scenarios where shell can hang when NFS path is present in PATH variable. *) When ksh93 is invoked it does stat on all directories which is present in PATH variable and it tries to open .paths file. If NFS directory is present in PATH which is not reachable then ksh93 shell hangs. 1 86632 open:entry /usr/openwin/bin/.paths libc.so.1`__open_syscall+0xa libc.so.1`open+0x137 libshell.so.1`path_chkpaths+0xcc libshell.so.1`path_addcomp+0x3f2 libshell.so.1`path_addpath+0xc1 libshell.so.1`path_init+0x70 libshell.so.1`path_opentype+0x51 libshell.so.1`path_open+0xb libshell.so.1`sh_source+0x30 libshell.so.1`sh_main+0x43f ksh93`main+0x52 ksh93`0x400ccc *) When a command is being executed under ksh93. It does a stat on file in all the directories under PATH. Which can also can cause hang if NFS fileserver is not reachable. # dtrace -n 'syscall::*stat*:entry /execname=="ksh93"/{ trace(copyinstr(arg0));}' dtrace: description 'syscall::*stat*:entry ' matched 15 probes CPU ID FUNCTION:NAME 1 86658 stat:entry /opt/SUNWspro/bin/ls 1 86658 stat:entry /usr/X11R6/bin/ls 1 86658 stat:entry /usr/dt/bin/ls 1 86658 stat:entry /usr/local/bin/ls 1 86658 stat:entry /usr/bin/ls 1 86774 lstat:entry /usr/bin/ls ==== *** (#6 of 6): 2009-10-07 10:50:51 GMT+00:00 <User 1-5Q-5197> === *Workaround* ============================================================= === *Additional Details* ===================================================== Targeted Release: solaris_nevada Commit To Fix In Build: snv_128 Fixed In Build: snv_128 Integrated In Build: snv_128 Verified In Build: See Also: 6437624, 6793763 Duplicate of: Hooks: Hook1: Hook2: Hook3: Hook4: Hook5: <email address omitted> Hook6: <email address omitted> Program Management: Fix Integrated into Source Root Cause: Other - see Research Activity Fix Affects Documentation: No Fix Affects Localization: No === *History* ================================================================ Date Submitted: 2007-11-16 18:04:11 GMT+00:00 Submitted By: <User 1-5Q-12482> Status Changed Date Updated Updated By 3-Accepted 2008-08-20 22:57:39 GMT+00:00 <User 1-5Q-5151> 2-Incomplete 2009-04-16 07:32:19 GMT+00:00 <User 1-1SURPB> 3-Accepted 2009-04-16 20:14:59 GMT+00:00 <User 1-5Q-4224> 5-Cause Known 2009-10-06 09:35:50 GMT+00:00 <User 1-GN0KC> 7-Fix in Progress 2009-10-23 17:23:50 GMT+00:00 <User 1-7MTUEB> 8-Fix Available 2009-10-28 18:23:28 GMT+00:00 <User 1-5HNZ8F> 10-Fix Delivered 2009-11-23 05:16:40 GMT+00:00 <User 1-2S67RN> === *Service Request* ======================================================== Impact: Significant Functionality: Secondary Severity: 3 Product Name: solaris Product Release: solaris_nevada Product Build: Operating System: snv_77 Hardware: ultrasparc Submitted Date: 2007-11-16 18:04:11 GMT+00:00 === *Service Request* ======================================================== Impact: Significant Functionality: Primary Severity: 2 Product Name: solaris Product Release: solaris_nevada Product Build: snv_110 Operating System: snv_110 Hardware: generic Submitted Date: 2009-04-16 00:44:28 GMT+00:00 === *Service Request* ======================================================== Impact: Critical Functionality: Primary Severity: 1 Product Name: solaris Product Release: solaris_nevada Product Build: snv_122 Operating System: snv_122 Hardware: generic Submitted Date: 2009-09-15 17:34:12 GMT+00:00 === *Multiple Release (MR) Cluster* - 0 ======================================