*Synopsis*: *ksh93* VARIABLE=`command substitution` assignment is not reliable 
on OpenSolaris

CR 6745015 changed on Jan 16 2009 by <User 1-5Q-3253>

=== Field ============ === New Value ============= === Old Value =============

Integrated in Build    snv_106                                                
Status                 10-Fix Delivered            8-Fix Available            
====================== =========================== ===========================

     
*Change Request ID*: 6745015

*Synopsis*: *ksh93* VARIABLE=`command substitution` assignment is not reliable 
on OpenSolaris

  Product: solaris
  Category: shell
  Subcategory: korn93
  Type: Defect
  Subtype: Reliability
  Status: 10-Fix Delivered
  Substatus: 
  Priority: 2-High
  Introduced In Release: solaris_nevada
  Introduced In Build: snv_72
  Responsible Engineer: <User 1-5Q-5151>
  Keywords: backprimes, backquotes, command, oss-request, oss-sponsor, 
substitution

=== *Description* ============================================================
The Bourne shell in OpenSolaris 2008.11 snv_93 X86 does not reliably handle
back-tick command substitution such as
        PORT=`my_command my_args`

The command in question outputs both standard error (which should be discarded)
and standard out, which should be the value assigned to the shell variable PORT.
The attached script ("vglconnect") works on Solaris 8, 9, 10, and Linux of many
flavors.  This is part of the Sun Shared Visualization Software product's 
SUNWvgl package (VirtualGL).

The line that generally fails on OpenSolaris is line 95 below:

 80 if [ ! "$VGL_PORT" = "" -a "$__VGL_SSHTUNNEL" = "1" ]; then
 81         PORT=$VGL_PORT
 82 else
 83         VGLCLIENT=`dirname $0`/vglclient
 84         if [ ! -x $VGLCLIENT ]; then
 85                 if [ -x /opt/VirtualGL/bin/vglclient ]; then
 86                         VGLCLIENT=/opt/VirtualGL/bin/vglclient
 87                 else
 88                         if [ -x /opt/SUNWvgl/bin/vglclient ]; then
 89                                 VGLCLIENT=/opt/SUNWvgl/bin/vglclient
 90                         else
 91                                 VGLCLIENT=vglclient
 92                         fi
 93                 fi
 94         fi
 95         PORT=`$VGLCLIENT $VGLARGS`
 96         if [ $? -ne 0 -o "$PORT" = "" ]; then
 97                 echo "[VGL] ERROR: vglclient failed to execute."
 98                 exit 1
 99         fi
100         echo
101 fi


On a host that has VirtualGL installed, and is not yet running a vglclient
daemon (started implicitly by vglconnect), invoking vglclient with a hostname
argument should start the vglclient daemon, get its port number (4242), and
issue an ssh -X to the hostname provided as its argument.  (ssh prompts for
your password; vglconnect's argument may also be 
        <email address omitted>
since ssh accepts that syntax, as well.)

Inserting 
        set -x
prior shows that PORT is being assigned a null string "" rather than the
stdout of the process, which is a port number (4242).

We changed line 96 and 96 to instead save vglclient's stdout in a file.
At that point the similar command
100         PORT=`/bin/cat /tmp/$$`
would not reliably assign the port number to our PORT variable.
Though sometimes it worked, we could not characterize what made it succeed
or fail.  Our tiny test cases all succeeded.

The workaround we used was to instead test the filename for size, avoiding
all dependency on the assignment to our shell variable.  See WorkAround.

*** (#1 of 8): 2008-09-04 23:11:39 GMT+00:00 <User 1-5Q-2845>

Reporter:
Can you please check whether the issue is fixed with the new ksh93 binaries 
from 
http://www.opensolaris.org/os/project/ksh93-integration/downloads/2008-08-10/ ?

*** (#2 of 8): 2008-09-11 15:31:32 GMT+00:00 <User 1-6Y4MMS>

BTW: Looking at this line:
95         PORT=`$VGLCLIENT $VGLARGS`

Does command "$VGLCLIENT" run once and exit or does it |fork()| into the 
background ? This _may_ ([1]) explain the timing issue with the "sleep".

[1]=(... but I suspect this is one of two other bugs we fixed in 
ksh93-integration update1 already but I'd like to rule-out the timing problem 
first)

*** (#3 of 8): 2008-09-11 15:43:07 GMT+00:00 <User 1-6Y4MMS>

I downloaded and installed the bits from 
http://www.opensolaris.org/os/project/ksh93-integration/downloads/2008-08-10/

and the problem still exists.

The $VGLCLIENT does fork() into the background and continue running.  It is a 
daemon
process.

Something that might be helpful is that if we run $VGLCLIENT after another 
$VGLCLIENT is already running,  $PORT is correctly captured by the shell.  In 
this instance, the process does not fork(), and just exits, and more text is 
being written to stderr (indicating that another $VGLCLIENT is already running).

*** (#4 of 8): 2008-09-11 16:31:37 GMT+00:00 <User 1-5Q-7324>

> The $VGLCLIENT does fork() into the background and continue running.  It is a 
> daemon process.

Is it possible that the information you need is written by the background 
process (the |fork()|'ed one) and not the forground one (the one started by the 
shell itself) ? It look we have a race condition then (and ksh93 is "guilty" of 
being faster than the original Bourne shell).

What happens if you change...
-- snip --
VARIABLE=`command substitution`
-- snip --
... to ...
-- snip --
VARIABLE=`command substitution ; sleep 5`
-- snip --

Does this work ?

*** (#5 of 8): 2008-09-11 16:44:09 GMT+00:00 <User 1-6Y4MMS>

Yes, the output we are trying to capture is written by the fork()'d child 
process.

Adding the sleep fixes the problem, e.g.
PORT=`$VGLCLIENT $VGLARGS; sleep 5`

works correctly.

*** (#6 of 8): 2008-09-11 17:05:25 GMT+00:00 <User 1-5Q-7324>

> Yes, the output we are trying to capture is written by the fork()'d
> child process.

The command substitution in a shell can only reliably capture the output of a 
forground process. The background process writes to the same (shell-internal) 
temporary file but the shell will only capture those data which were completely 
written at the point where the forground process returns. That's a "classical" 
race condition... ;-(
ksh93 has the (unfortunate) issue that it is a lot faster than the original 
Bourne shell and therefore more likely hits this specific race condition since 
the time between the exit of the forground process and capturing of the output 
is far smaller (that's why injecting a "sleep" works).

AFAIK this is "not 'our'" bug since you get the same result with the Bourne 
shell and a faster machine and the script itself needs to be fixed somehow. The 
"sleep" works as a "workaround" but will still fail on a sufficiently loaded 
machine (the same applies to "tricks" (or better: "abuse") like $ sleep 1 ; 
sync ; sleep 1 ; sync ; sleep 1 ; sync # etc.).

AFIAK it may be usefull to drag the issue to <email address omitted> (please 
subscribe via http://mail.opensolaris.org/mailman/listinfo/shell-discuss before 
posting) and discuss a solution there (I have two different solutions in mind 
but I don't know VGL good enougth).

*** (#7 of 8): 2008-09-11 17:32:11 GMT+00:00 <User 1-6Y4MMS>

> The command substitution in a shell can only reliably capture the output of a
> forground process.

BTW: The statement is not 100% correct - the shell can capture output of 
background processes if you wait for them to write there stuff completely, e.g.
$ var="$(/usr/bin/ls -l & ; wait)" # will run /usr/bin/ls -l as background job 
but the "wait" statement will wait until the background job is complete (which 
doesn't exactly help in our case since we have to deal with a deamon process 
which runs "forever").

*** (#8 of 8): 2008-09-11 17:35:23 GMT+00:00 <User 1-6Y4MMS>


=== *Public Comments* ========================================================

=== *Workaround* =============================================================
This workaround appears to be reliable on OpenSolaris.  Instead of using
back-tick command substitution, we save the output in a file and test
that file's size:


 97 #       PORT=`$VGLCLIENT $VGLARGS`
 98         $VGLCLIENT $VGLARGS 1> /tmp/$$
 99         STAT=$?
100         PORT=`/bin/cat /tmp/$$`
101         echo "PORT=$PORT, CMD='$VGLCLIENT $VGLARGS'"
+++         # sleep 1 ; echo "slept 1 second"
102         [ -s /tmp/$$ ] && echo "[VGL] /tmp/$$ is non-empty."
103         [ ! -s /tmp/$$ ] && echo "[VGL] /tmp/$$ is empty."
104         # [ $STAT -ne 0 -o  ! -s /tmp/$$ ] && echo "[will exit...]"
105         if [ $STAT -ne 0 ]; then
106                 echo "[VGL 1] ERROR: vglclient failed to execute."
107                 exit 1
108         fi
109         if [ ! -s /tmp/$$ ]; then
110                 echo "[VGL 2] ERROR: vglclient failed to execute."
111                 exit 1
112         fi
113         echo
114 fi

*** (#1 of 1): 2008-09-04 23:11:39 GMT+00:00 <User 1-5Q-2845>


=== *Additional Details* =====================================================
        Targeted Release: solaris_nevada
        Commit To Fix In Build: snv_106
        Fixed In Build: snv_106
        Integrated In Build: snv_106
        Verified In Build: 
  See Also: 6437624, 6619428
  Duplicate of: 
  Hooks:
        Hook1: 
        Hook2: 
        Hook3: no-NAS
        Hook4: 
        Hook5: <email address omitted>
        Hook6: <email address omitted>
  Program Management: 
  Root Cause: Other - see Research Activity
  Fix Affects Documentation: No
  Fix Affects Localization: No

=== *History* ================================================================
        Date Submitted: 2008-09-04 23:11:38 GMT+00:00
        Submitted By: <User 1-5Q-2845>

        Status Changed    Date Updated                  Updated By
        3-Accepted        2008-09-10 17:20:13 GMT+00:00 <User 1-5Q-5151>
        6-Fix Understood  2008-09-17 19:13:50 GMT+00:00 <User 1-5Q-5151>
        7-Fix in Progress 2008-12-22 07:42:35 GMT+00:00 <User 1-5Q-5151>
        8-Fix Available   2008-12-27 23:21:53 GMT+00:00 <User 1-5HNZ8F>
        10-Fix Delivered  2009-01-16 03:26:44 GMT+00:00 <User 1-5Q-3253>


=== *Service Request* ========================================================
        Impact: Significant
        Functionality: Primary
        Severity: 2
        Product Name: solaris
        Product Release: solaris_nevada
        Product Build: 
        Operating System: snv_93
        Hardware: x86
        Submitted Date: 2008-09-04 23:11:39 GMT+00:00


=== *Multiple Release (MR) Cluster* - 0 ======================================


Reply via email to