On 02/08/11 10:30, Srimannarayana Bhavanam wrote:
Roger A. Faulkner wrote:
Not literally, but yes, this is the way Solaris behaves, down in the kernel.
The cause is that all of the [l]lseek() calls end up going through
common code that ends up setting the file offset to the newly-
computed offset.  The code needs to make a special case of this
and not set the file offset if it is not actually being asked to change.
Thank you very much! But, I am wondering why there is no attempt to
synchronize access to file_t structure between lseek and write.

When process 1 is trying change the offset by doing lseek over fd1, if another process attempts a write over fd2 which shares file_t structure with fd1, I thought write should get blocked until lseek is complete. But, I don't see any locking mechanism over file_t structure. Is it something that needs to be fixed? Or, this
behavior is intentional?

It's been that way since the beginning of time (since 1994 at least).
I'm guessing that no one ever thought about it and were focused
on maximizing concurrency in any case.

I found these words  in the latest POSIX spec:

2.9.7 Thread Interactions with Regular File Operations

    All of the following functions shall be atomic with respect
    to each other in the effects specified in POSIX.1-2008 when
    they operate on regular files or symbolic links:

    chmod()    fchownat()  lseek()      readv()     unlink()
    chown()    fcntl()     lstat()      pwrite()    unlinkat()
    close()    fstat()     open()       rename()    utime()
    creat()    fstatat()   openat()     renameat()  utimensat()
    dup2()     ftruncate() pread()      stat()      utimes()
    fchmod()   lchown()    read()       symlink()   write()
    fchmodat() link()      readlink()   symlinkat() writev()
    fchown()   linkat()    readlinkat() truncate()

    If two threads each call one of these functions, each call
    shall either see all of the specified effects of the other
    call, or none of them.

and these words in the previous POSIX spec:

2.9.7 Thread Interactions with Regular File Operations

    All of the functions chmod(), close(), fchmod(), fcntl(),
    fstat(), ftruncate(), lseek(), open(), read(), readlink(),
    stat(), symlink(), and write() shall be atomic with respect
    to each other in the effects specified in IEEE Std 1003.1-2001
    when they operate on regular files. If two threads each call
    one of these functions, each call shall either see all of the
    specified effects of the other call, or none of them.

So it certainly appears that Solaris is out of conformance.
Some additional read()/write()/lseek() locking is needed
in the kernel.

I concocted a test case for the atomicity of read(), as follows:

================  test case  ==================
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>

#define INPUT   "trash"
#define OUTPUT  "junk"

#define MAX_THREADS     10

#define NUM_LINES       40
#define LINE_SIZE       80
#define MAX_BUFSZ       (NUM_LINES * LINE_SIZE)
#define NUM_BUFFERS     1000

static char line[NUM_LINES][LINE_SIZE];

static int fd_in;       /* input file */
static int fd_out;      /* output file */

static void
make_input_data(void)
{
        char *p;
        int fd;
        int c = ' ';
        int i, j;

        for (i = 0; i < NUM_LINES; i++) {
                p = line[i];
for (j = 0; j < LINE_SIZE - 1; j++) /* printable ascii */
                        if ((p[j] = c++) == 'z')
                                c = ' ';
                p[LINE_SIZE - 1] = '\n';
        }
        if ((fd = open(INPUT, O_WRONLY | O_CREAT | O_TRUNC, 0644)) < 0) {
                perror(INPUT);
                exit(1);
        }
        for (i = 0; i < NUM_BUFFERS; i++)
                (void) write(fd, line, sizeof (line));
        (void) fsync(fd);
        (void) close(fd);
}

static void *
rd_wr_thread(void *arg)
{
        size_t size = (size_t)arg;
        ssize_t rval;
        int buf[MAX_BUFSZ / sizeof (int)];

        while ((rval = read(fd_in, buf, size)) > 0) {
                (void) write(fd_out, buf, rval);
        }

        return (NULL);
}

int
main(int argc, char **argv)
{
        pthread_t tid[MAX_THREADS];
        int nthreads = 4;
        size_t bsize;
        int i;

        if (argc >= 2 && (nthreads = atoi(argv[1])) > MAX_THREADS)
                nthreads = MAX_THREADS;

        make_input_data();

        if ((fd_in = open(INPUT, O_RDONLY)) < 0) {
                perror(INPUT);
                return (1);
        }

        if ((fd_out =
open(OUTPUT, O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, 0644)) < 0) {
                perror(OUTPUT);
                return (1);
        }

        for (i = 0; i < nthreads; i++) {
                bsize = MAX_BUFSZ * (i + 1) / nthreads;
                bsize = ((bsize + LINE_SIZE - 1) / LINE_SIZE) * LINE_SIZE;
                (void) pthread_create(&tid[i], NULL, rd_wr_thread,
                    (void *)bsize);
        }

        for (i = 0; i < nthreads; i++)
                (void) pthread_join(tid[i], NULL);

        (void) close(fd_in);
        (void) close(fd_out);

        return (0);
}
=========================================

The test has several threads reading from one file descriptor,
in units of 1, 2, ..., 40 80-byte lines at a time, then writing to
another file descriptor, all with no user-space locking.

Theoretically, this should produce an output file with exactly
the same lines as in the input file,  but with the lines shuffled.

In fact, on a multiprocessor machine, the sizes of the two files
("trash" and "junk") are not the same.  Solaris fails the test.

I then tried the same test, unmodified, on a Linux machine:
    2.6.35-22-generic #35-Ubuntu SMP GNU/Linux
and it exhibits the same problem.  Ubuntu Linux fails the test.

I would appreciate it if some people out there would run this test
on other systems (Red Hat Linux, NETBSD, Apple OS X, HP-UX, IRIX)
and post the results.

Thanks,
Roger Faulkner


_______________________________________________
ksh93-integration-discuss mailing list
ksh93-integration-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/ksh93-integration-discuss

Reply via email to