On 02/08/11 10:30, Srimannarayana Bhavanam wrote:
Roger A. Faulkner wrote:
Not literally, but yes, this is the way Solaris behaves, down in the
kernel.
The cause is that all of the [l]lseek() calls end up going through
common code that ends up setting the file offset to the newly-
computed offset. The code needs to make a special case of this
and not set the file offset if it is not actually being asked to change.
Thank you very much! But, I am wondering why there is no attempt to
synchronize access to file_t structure between lseek and write.
When process 1 is trying change the offset by doing lseek over fd1, if
another
process attempts a write over fd2 which shares file_t structure with
fd1, I thought
write should get blocked until lseek is complete. But, I don't see
any locking
mechanism over file_t structure. Is it something that needs to be
fixed? Or, this
behavior is intentional?
It's been that way since the beginning of time (since 1994 at least).
I'm guessing that no one ever thought about it and were focused
on maximizing concurrency in any case.
I found these words in the latest POSIX spec:
2.9.7 Thread Interactions with Regular File Operations
All of the following functions shall be atomic with respect
to each other in the effects specified in POSIX.1-2008 when
they operate on regular files or symbolic links:
chmod() fchownat() lseek() readv() unlink()
chown() fcntl() lstat() pwrite() unlinkat()
close() fstat() open() rename() utime()
creat() fstatat() openat() renameat() utimensat()
dup2() ftruncate() pread() stat() utimes()
fchmod() lchown() read() symlink() write()
fchmodat() link() readlink() symlinkat() writev()
fchown() linkat() readlinkat() truncate()
If two threads each call one of these functions, each call
shall either see all of the specified effects of the other
call, or none of them.
and these words in the previous POSIX spec:
2.9.7 Thread Interactions with Regular File Operations
All of the functions chmod(), close(), fchmod(), fcntl(),
fstat(), ftruncate(), lseek(), open(), read(), readlink(),
stat(), symlink(), and write() shall be atomic with respect
to each other in the effects specified in IEEE Std 1003.1-2001
when they operate on regular files. If two threads each call
one of these functions, each call shall either see all of the
specified effects of the other call, or none of them.
So it certainly appears that Solaris is out of conformance.
Some additional read()/write()/lseek() locking is needed
in the kernel.
I concocted a test case for the atomicity of read(), as follows:
================ test case ==================
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>
#define INPUT "trash"
#define OUTPUT "junk"
#define MAX_THREADS 10
#define NUM_LINES 40
#define LINE_SIZE 80
#define MAX_BUFSZ (NUM_LINES * LINE_SIZE)
#define NUM_BUFFERS 1000
static char line[NUM_LINES][LINE_SIZE];
static int fd_in; /* input file */
static int fd_out; /* output file */
static void
make_input_data(void)
{
char *p;
int fd;
int c = ' ';
int i, j;
for (i = 0; i < NUM_LINES; i++) {
p = line[i];
for (j = 0; j < LINE_SIZE - 1; j++) /* printable
ascii */
if ((p[j] = c++) == 'z')
c = ' ';
p[LINE_SIZE - 1] = '\n';
}
if ((fd = open(INPUT, O_WRONLY | O_CREAT | O_TRUNC, 0644)) < 0) {
perror(INPUT);
exit(1);
}
for (i = 0; i < NUM_BUFFERS; i++)
(void) write(fd, line, sizeof (line));
(void) fsync(fd);
(void) close(fd);
}
static void *
rd_wr_thread(void *arg)
{
size_t size = (size_t)arg;
ssize_t rval;
int buf[MAX_BUFSZ / sizeof (int)];
while ((rval = read(fd_in, buf, size)) > 0) {
(void) write(fd_out, buf, rval);
}
return (NULL);
}
int
main(int argc, char **argv)
{
pthread_t tid[MAX_THREADS];
int nthreads = 4;
size_t bsize;
int i;
if (argc >= 2 && (nthreads = atoi(argv[1])) > MAX_THREADS)
nthreads = MAX_THREADS;
make_input_data();
if ((fd_in = open(INPUT, O_RDONLY)) < 0) {
perror(INPUT);
return (1);
}
if ((fd_out =
open(OUTPUT, O_WRONLY | O_CREAT | O_TRUNC | O_APPEND,
0644)) < 0) {
perror(OUTPUT);
return (1);
}
for (i = 0; i < nthreads; i++) {
bsize = MAX_BUFSZ * (i + 1) / nthreads;
bsize = ((bsize + LINE_SIZE - 1) / LINE_SIZE) * LINE_SIZE;
(void) pthread_create(&tid[i], NULL, rd_wr_thread,
(void *)bsize);
}
for (i = 0; i < nthreads; i++)
(void) pthread_join(tid[i], NULL);
(void) close(fd_in);
(void) close(fd_out);
return (0);
}
=========================================
The test has several threads reading from one file descriptor,
in units of 1, 2, ..., 40 80-byte lines at a time, then writing to
another file descriptor, all with no user-space locking.
Theoretically, this should produce an output file with exactly
the same lines as in the input file, but with the lines shuffled.
In fact, on a multiprocessor machine, the sizes of the two files
("trash" and "junk") are not the same. Solaris fails the test.
I then tried the same test, unmodified, on a Linux machine:
2.6.35-22-generic #35-Ubuntu SMP GNU/Linux
and it exhibits the same problem. Ubuntu Linux fails the test.
I would appreciate it if some people out there would run this test
on other systems (Red Hat Linux, NETBSD, Apple OS X, HP-UX, IRIX)
and post the results.
Thanks,
Roger Faulkner
_______________________________________________
ksh93-integration-discuss mailing list
ksh93-integration-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/ksh93-integration-discuss