Dear ksh93 experts,
I am working on ksh93 CR 7007665 reported recently by dmittry
(dmitry.mikhailiche...@oracle.com). Dmittry also did a lot of initial
investigation and pointed us to the root cause of this issue. I would to
like to discuss with you the details of this CR. I appreciate any help
for its resolution.
The root cause of the issue lies in the lack of synchronization among
parent/child ksh instances while performing I/O to file descriptors that
share common file offset. The issue is becoming visible on systems with
large number of processors. Specifically, in our case, it is happening
during the execution of the following eval command in the shell script
(source of the shell script is given at the bottom of this email).
eval "copy | copy" < 4496648b.execute.out >
hydra_actual_filtered 2>&1
The expected result of this command is to copy the contents of
"4496648b.execute.out" to "hydra_actual_filtered". But, in few
iteractions, the output produced in "hydra_actual_filtered" is
different from the content of input file.
Here is the detailed evaluation so far:
The ksh process processes the I/O redirection operators - >, < and 2>&1
prior to processing the argument to eval command. After I/O redirection
operators are processed - stdin, stdout and stderr of main ksh process
are setup as below:
0 - 4496648b.execute.out
1 - hydra_actual_filtered
2 - hydra_actual_filtered
Now, ksh evaluates the argument to eval and finds out that a pipeline
needs to be setup. So, it creates a pipe, spawns a child process to
execute the first command in the pipeline (The second command in the
pipeline is executed by the parent ksh process itself).
Now, both parent and child do a series of dup & close operations to
get the file descriptors set up as required for the pipeline.
Finally, file descriptors are set up as below for the child and
parent
0 - 4496648b.execute.out < 0 - pipe
1 - pipe > 1 -
hydra_actual_filtered
2 - hydra_actual_filtered 2 -
hydra_actual_filtered
child parent
------ --------
Among the above descriptors list, 2 of child and 1& 2 of parent share
the same file offset.
Now, the child (as part of executing first instance of copy function)
starts reading input from stdin (i.e. 4496648b.execute.out) and writes
it to stdout (i.e. pipe). The parent reads whatever input it has got
over stdin (i.e. pipe ) and writes it to stdout (i.e.
hydra_actual_filtered).
The problem is happening when the child is about to exit. As part of
exit, setbuf is called over each descriptor stream (_sfcleanup ->
setbuf->sfsk->lseek). When setbuf is called for desccriptor 2 in child,
there is an lseek attempted in that routine. This lseek by child over
desc 2 is overlapping with write by parent over desc 1 and it is
resulting in a race condition
The following seems to be the sequence of events taking place :
Let us assume that file offset is 10 before the start of event 1
event1 1 : child starts lseek over 2 and computes new offset
- lseek(2, 0, SEEK_CUR)
Here, new computed offet is 10
event 2 : parent initiates a write and writes 10 bytes to desc 1
event 3 : parent updates offset as part of write and write returns.
The new offset after write is 20
event 4 : child updates the computed offset and lseek returns
Here, new offet is 10
event 5 : parent initiates another write of 10 bytes.
In fact, write by parent in event 5 should start writing at offset 20.
However, because of the intervening lseek, offset got changed and it
starts writing again at offset 10, thereby overwriting the previous
contents.
I think this is a general problem that can happen anywhere whenever we
have multiple processes with open descriptors sharing the file offset.
To address this issue, it looks like we need to protect the entry/exit
into/out of I/O wrapper routines sfwr, sfrd, sfsk etc in libast. It
looks like there is some skelton code implemented to handle this
synchronization. It can be seen from following lines in these routines.
SFMTXDECL(f);
SFMTXENTER(f,-1);
But, this is not fully implented. With vt_threaded defined as 0 in
vtthread.h, the sfmutex routine which is called from SFMUTEXENTER macro
simply returns 0 :
vtthread.h :
37 /* ast doesn't do threads yet */
38 #if _PACKAGE_ast && !defined(vt_threaded)
39 #define vt_threaded 0
40 #endif
We are thinking that it may be a risky solution to implement mutex
locking with PTHREAD_PROCESS_SHARED attribute for interprocess
synchronization.
Also thinking on the lines of avoding the lseek calls to avoid the race
condition.
We appreciate any help on the resolution of this issue. Please let me
know if any more information is required.
Looking forward for your inputs.
Regards,
-Sriman
Shell script
--------------
#!/bin/sh
gen_file() {
echo
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"
echo
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"
echo
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"
echo
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"
echo
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"
echo
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"
}
copy() {
while read line; do
echo $line
done
}
gen_file > hydra_pass_filtered
gen_file > 4496648b.execute.out
n=0
while [ $n -lt 1000 ]; do
eval "copy | copy" < 4496648b.execute.out > hydra_actual_filtered 2>&1
diff hydra_pass_filtered hydra_actual_filtered > hydra_diff_output
|| exit 5
n=`expr $n + 1`
done
_______________________________________________
ksh93-integration-discuss mailing list
ksh93-integration-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/ksh93-integration-discuss