*Synopsis*: *ksh93* VARIABLE=`command substitution` assignment is not reliable on OpenSolaris
CR 6745015 changed on Jan 16 2009 by <User 1-5Q-3253> === Field ============ === New Value ============= === Old Value ============= Integrated in Build snv_106 Status 10-Fix Delivered 8-Fix Available ====================== =========================== =========================== *Change Request ID*: 6745015 *Synopsis*: *ksh93* VARIABLE=`command substitution` assignment is not reliable on OpenSolaris Product: solaris Category: shell Subcategory: korn93 Type: Defect Subtype: Reliability Status: 10-Fix Delivered Substatus: Priority: 2-High Introduced In Release: solaris_nevada Introduced In Build: snv_72 Responsible Engineer: <User 1-5Q-5151> Keywords: backprimes, backquotes, command, oss-request, oss-sponsor, substitution === *Description* ============================================================ The Bourne shell in OpenSolaris 2008.11 snv_93 X86 does not reliably handle back-tick command substitution such as PORT=`my_command my_args` The command in question outputs both standard error (which should be discarded) and standard out, which should be the value assigned to the shell variable PORT. The attached script ("vglconnect") works on Solaris 8, 9, 10, and Linux of many flavors. This is part of the Sun Shared Visualization Software product's SUNWvgl package (VirtualGL). The line that generally fails on OpenSolaris is line 95 below: 80 if [ ! "$VGL_PORT" = "" -a "$__VGL_SSHTUNNEL" = "1" ]; then 81 PORT=$VGL_PORT 82 else 83 VGLCLIENT=`dirname $0`/vglclient 84 if [ ! -x $VGLCLIENT ]; then 85 if [ -x /opt/VirtualGL/bin/vglclient ]; then 86 VGLCLIENT=/opt/VirtualGL/bin/vglclient 87 else 88 if [ -x /opt/SUNWvgl/bin/vglclient ]; then 89 VGLCLIENT=/opt/SUNWvgl/bin/vglclient 90 else 91 VGLCLIENT=vglclient 92 fi 93 fi 94 fi 95 PORT=`$VGLCLIENT $VGLARGS` 96 if [ $? -ne 0 -o "$PORT" = "" ]; then 97 echo "[VGL] ERROR: vglclient failed to execute." 98 exit 1 99 fi 100 echo 101 fi On a host that has VirtualGL installed, and is not yet running a vglclient daemon (started implicitly by vglconnect), invoking vglclient with a hostname argument should start the vglclient daemon, get its port number (4242), and issue an ssh -X to the hostname provided as its argument. (ssh prompts for your password; vglconnect's argument may also be <email address omitted> since ssh accepts that syntax, as well.) Inserting set -x prior shows that PORT is being assigned a null string "" rather than the stdout of the process, which is a port number (4242). We changed line 96 and 96 to instead save vglclient's stdout in a file. At that point the similar command 100 PORT=`/bin/cat /tmp/$$` would not reliably assign the port number to our PORT variable. Though sometimes it worked, we could not characterize what made it succeed or fail. Our tiny test cases all succeeded. The workaround we used was to instead test the filename for size, avoiding all dependency on the assignment to our shell variable. See WorkAround. *** (#1 of 8): 2008-09-04 23:11:39 GMT+00:00 <User 1-5Q-2845> Reporter: Can you please check whether the issue is fixed with the new ksh93 binaries from http://www.opensolaris.org/os/project/ksh93-integration/downloads/2008-08-10/ ? *** (#2 of 8): 2008-09-11 15:31:32 GMT+00:00 <User 1-6Y4MMS> BTW: Looking at this line: 95 PORT=`$VGLCLIENT $VGLARGS` Does command "$VGLCLIENT" run once and exit or does it |fork()| into the background ? This _may_ ([1]) explain the timing issue with the "sleep". [1]=(... but I suspect this is one of two other bugs we fixed in ksh93-integration update1 already but I'd like to rule-out the timing problem first) *** (#3 of 8): 2008-09-11 15:43:07 GMT+00:00 <User 1-6Y4MMS> I downloaded and installed the bits from http://www.opensolaris.org/os/project/ksh93-integration/downloads/2008-08-10/ and the problem still exists. The $VGLCLIENT does fork() into the background and continue running. It is a daemon process. Something that might be helpful is that if we run $VGLCLIENT after another $VGLCLIENT is already running, $PORT is correctly captured by the shell. In this instance, the process does not fork(), and just exits, and more text is being written to stderr (indicating that another $VGLCLIENT is already running). *** (#4 of 8): 2008-09-11 16:31:37 GMT+00:00 <User 1-5Q-7324> > The $VGLCLIENT does fork() into the background and continue running. It is a > daemon process. Is it possible that the information you need is written by the background process (the |fork()|'ed one) and not the forground one (the one started by the shell itself) ? It look we have a race condition then (and ksh93 is "guilty" of being faster than the original Bourne shell). What happens if you change... -- snip -- VARIABLE=`command substitution` -- snip -- ... to ... -- snip -- VARIABLE=`command substitution ; sleep 5` -- snip -- Does this work ? *** (#5 of 8): 2008-09-11 16:44:09 GMT+00:00 <User 1-6Y4MMS> Yes, the output we are trying to capture is written by the fork()'d child process. Adding the sleep fixes the problem, e.g. PORT=`$VGLCLIENT $VGLARGS; sleep 5` works correctly. *** (#6 of 8): 2008-09-11 17:05:25 GMT+00:00 <User 1-5Q-7324> > Yes, the output we are trying to capture is written by the fork()'d > child process. The command substitution in a shell can only reliably capture the output of a forground process. The background process writes to the same (shell-internal) temporary file but the shell will only capture those data which were completely written at the point where the forground process returns. That's a "classical" race condition... ;-( ksh93 has the (unfortunate) issue that it is a lot faster than the original Bourne shell and therefore more likely hits this specific race condition since the time between the exit of the forground process and capturing of the output is far smaller (that's why injecting a "sleep" works). AFAIK this is "not 'our'" bug since you get the same result with the Bourne shell and a faster machine and the script itself needs to be fixed somehow. The "sleep" works as a "workaround" but will still fail on a sufficiently loaded machine (the same applies to "tricks" (or better: "abuse") like $ sleep 1 ; sync ; sleep 1 ; sync ; sleep 1 ; sync # etc.). AFIAK it may be usefull to drag the issue to <email address omitted> (please subscribe via http://mail.opensolaris.org/mailman/listinfo/shell-discuss before posting) and discuss a solution there (I have two different solutions in mind but I don't know VGL good enougth). *** (#7 of 8): 2008-09-11 17:32:11 GMT+00:00 <User 1-6Y4MMS> > The command substitution in a shell can only reliably capture the output of a > forground process. BTW: The statement is not 100% correct - the shell can capture output of background processes if you wait for them to write there stuff completely, e.g. $ var="$(/usr/bin/ls -l & ; wait)" # will run /usr/bin/ls -l as background job but the "wait" statement will wait until the background job is complete (which doesn't exactly help in our case since we have to deal with a deamon process which runs "forever"). *** (#8 of 8): 2008-09-11 17:35:23 GMT+00:00 <User 1-6Y4MMS> === *Public Comments* ======================================================== === *Workaround* ============================================================= This workaround appears to be reliable on OpenSolaris. Instead of using back-tick command substitution, we save the output in a file and test that file's size: 97 # PORT=`$VGLCLIENT $VGLARGS` 98 $VGLCLIENT $VGLARGS 1> /tmp/$$ 99 STAT=$? 100 PORT=`/bin/cat /tmp/$$` 101 echo "PORT=$PORT, CMD='$VGLCLIENT $VGLARGS'" +++ # sleep 1 ; echo "slept 1 second" 102 [ -s /tmp/$$ ] && echo "[VGL] /tmp/$$ is non-empty." 103 [ ! -s /tmp/$$ ] && echo "[VGL] /tmp/$$ is empty." 104 # [ $STAT -ne 0 -o ! -s /tmp/$$ ] && echo "[will exit...]" 105 if [ $STAT -ne 0 ]; then 106 echo "[VGL 1] ERROR: vglclient failed to execute." 107 exit 1 108 fi 109 if [ ! -s /tmp/$$ ]; then 110 echo "[VGL 2] ERROR: vglclient failed to execute." 111 exit 1 112 fi 113 echo 114 fi *** (#1 of 1): 2008-09-04 23:11:39 GMT+00:00 <User 1-5Q-2845> === *Additional Details* ===================================================== Targeted Release: solaris_nevada Commit To Fix In Build: snv_106 Fixed In Build: snv_106 Integrated In Build: snv_106 Verified In Build: See Also: 6437624, 6619428 Duplicate of: Hooks: Hook1: Hook2: Hook3: no-NAS Hook4: Hook5: <email address omitted> Hook6: <email address omitted> Program Management: Root Cause: Other - see Research Activity Fix Affects Documentation: No Fix Affects Localization: No === *History* ================================================================ Date Submitted: 2008-09-04 23:11:38 GMT+00:00 Submitted By: <User 1-5Q-2845> Status Changed Date Updated Updated By 3-Accepted 2008-09-10 17:20:13 GMT+00:00 <User 1-5Q-5151> 6-Fix Understood 2008-09-17 19:13:50 GMT+00:00 <User 1-5Q-5151> 7-Fix in Progress 2008-12-22 07:42:35 GMT+00:00 <User 1-5Q-5151> 8-Fix Available 2008-12-27 23:21:53 GMT+00:00 <User 1-5HNZ8F> 10-Fix Delivered 2009-01-16 03:26:44 GMT+00:00 <User 1-5Q-3253> === *Service Request* ======================================================== Impact: Significant Functionality: Primary Severity: 2 Product Name: solaris Product Release: solaris_nevada Product Build: Operating System: snv_93 Hardware: x86 Submitted Date: 2008-09-04 23:11:39 GMT+00:00 === *Multiple Release (MR) Cluster* - 0 ======================================