Dear ksh93 experts,

I am working on ksh93 CR 7007665 reported recently by dmittry (dmitry.mikhailiche...@oracle.com). Dmittry also did a lot of initial investigation and pointed us to the root cause of this issue. I would to like to discuss with you the details of this CR. I appreciate any help for its resolution.

The root cause of the issue lies in the lack of synchronization among parent/child ksh instances while performing I/O to file descriptors that share common file offset. The issue is becoming visible on systems with large number of processors. Specifically, in our case, it is happening during the execution of the following eval command in the shell script (source of the shell script is given at the bottom of this email).

eval "copy | copy" < 4496648b.execute.out > hydra_actual_filtered 2>&1

The expected result of this command is to copy the contents of "4496648b.execute.out" to "hydra_actual_filtered". But, in few iteractions, the output produced in "hydra_actual_filtered" is different from the content of input file.

Here is the detailed evaluation so far:

The ksh process processes the I/O redirection operators - >, < and 2>&1 prior to processing the argument to eval command. After I/O redirection operators are processed - stdin, stdout and stderr of main ksh process are setup as below:

               0 - 4496648b.execute.out
               1 - hydra_actual_filtered
               2 - hydra_actual_filtered

Now, ksh evaluates the argument to eval and finds out that a pipeline needs to be setup. So, it creates a pipe, spawns a child process to execute the first command in the pipeline (The second command in the pipeline is executed by the parent ksh process itself).

Now, both parent and child do a series of dup & close operations to get the file descriptors set up as required for the pipeline. Finally, file descriptors are set up as below for the child and parent


               0 - 4496648b.execute.out                      < 0 - pipe
1 - pipe > 1 - hydra_actual_filtered 2 - hydra_actual_filtered 2 - hydra_actual_filtered

               child                                        parent
               ------                                       --------

Among the above descriptors list, 2 of child and 1& 2 of parent share the same file offset.

Now, the child (as part of executing first instance of copy function) starts reading input from stdin (i.e. 4496648b.execute.out) and writes it to stdout (i.e. pipe). The parent reads whatever input it has got over stdin (i.e. pipe ) and writes it to stdout (i.e. hydra_actual_filtered).

The problem is happening when the child is about to exit. As part of exit, setbuf is called over each descriptor stream (_sfcleanup -> setbuf->sfsk->lseek). When setbuf is called for desccriptor 2 in child, there is an lseek attempted in that routine. This lseek by child over desc 2 is overlapping with write by parent over desc 1 and it is resulting in a race condition


The following seems to be the sequence of events taking place :


Let us assume that file offset is 10 before the start of event 1

              event1 1 : child starts lseek over 2  and computes new offset
                          - lseek(2, 0, SEEK_CUR)
                             Here, new computed offet is 10
event 2 : parent initiates a write and writes 10 bytes to desc 1

event 3 : parent updates offset as part of write and write returns. The new offset after write is 20

              event 4 : child updates the computed offset and lseek returns
                               Here, new offet is 10

event 5 : parent initiates another write of 10 bytes.

In fact, write by parent in event 5 should start writing at offset 20. However, because of the intervening lseek, offset got changed and it starts writing again at offset 10, thereby overwriting the previous contents.


I think this is a general problem that can happen anywhere whenever we have multiple processes with open descriptors sharing the file offset. To address this issue, it looks like we need to protect the entry/exit into/out of I/O wrapper routines sfwr, sfrd, sfsk etc in libast. It looks like there is some skelton code implemented to handle this synchronization. It can be seen from following lines in these routines.


      SFMTXDECL(f);

      SFMTXENTER(f,-1);


But, this is not fully implented. With vt_threaded defined as 0 in vtthread.h, the sfmutex routine which is called from SFMUTEXENTER macro simply returns 0 :

 vtthread.h :

   37 /* ast doesn't do threads yet */
   38 #if _PACKAGE_ast && !defined(vt_threaded)
   39 #define vt_threaded     0
   40 #endif


We are thinking that it may be a risky solution to implement mutex locking with PTHREAD_PROCESS_SHARED attribute for interprocess synchronization. Also thinking on the lines of avoding the lseek calls to avoid the race condition.

We appreciate any help on the resolution of this issue. Please let me know if any more information is required.
Looking forward for your inputs.


Regards,
-Sriman


Shell script
--------------
#!/bin/sh

gen_file() {
echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901" echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901" echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901" echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901" echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901" echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"
}

copy() {
   while read line; do
       echo $line
   done
}

gen_file > hydra_pass_filtered
gen_file > 4496648b.execute.out

n=0
while [ $n -lt 1000 ]; do
   eval "copy | copy" < 4496648b.execute.out > hydra_actual_filtered 2>&1
diff hydra_pass_filtered hydra_actual_filtered > hydra_diff_output || exit 5
   n=`expr $n + 1`
done



_______________________________________________
ksh93-integration-discuss mailing list
ksh93-integration-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/ksh93-integration-discuss

Reply via email to