Hi,

I am currently investigating a problem related to mercurial:
https://bz.mercurial-scm.org/show_bug.cgi?id=6035

While running an update operation, it could ends unexpectly.

For its operation, it starts workers and communicate with them via
socket file waiting for EOF. It uses cPickle python extension for
serialize/deserialize messages.

It seems that when a worker exits after sending its last message,
because it has terminate the work, EOF detection doesn't work as
expected. The main process interpretes it as failure and abort. the
errno value is 35: EAGAIN.

The cPickle code seems correct:

pobj/Python-2.7.15/Python-2.7.15/Modules/cPickle.c

from read_file():
551     PyFile_IncUseCount((PyFileObject *)self->file);
552     Py_BEGIN_ALLOW_THREADS
553     nbytesread = fread(self->buf, sizeof(char), n, self->fp);
554     Py_END_ALLOW_THREADS
555     PyFile_DecUseCount((PyFileObject *)self->file);
556     if (nbytesread != (size_t)n) {
557         asm("int $3");
558         if (feof(self->fp)) {
559             PyErr_SetNone(PyExc_EOFError);
560             return -1;
561         }
562
563         PyErr_SetFromErrno(PyExc_IOError);
564         return -1;
565     }
566

(the asm() is mine).

it uses fread() and read 1 byte a time.

after the worker writes its last message, it is exiting. while looking
via ktrace, the exit(2) call is before main process finish to read the
buffer. it doesn't seems to be a problem.

the last call of fread(3) returns 0. as it asked for 1, the condition at
line 556 is valid and gdb catch the breakpoint.

(gdb) print *self->fp
$1 = {
  _p = 0x1582c6a5250 <usual+272> "",
  _r = 0,
  _w = 0,
  _flags = 6,
  _file = 5,
  _bf = {
    _base = 0x1582c6a524f <usual+271> "\n",
    _size = 1
  },
  _lbfsize = 0,
  _cookie = 0x1582c6a51d8 <usual+152>,
  _close = 0x1582c5f3a00 <__sclose>,
  _read = 0x1582c5f38f0 <__sread>,
  _seek = 0x1582c5f39a0 <__sseek>,
  _write = 0x1582c5f3940 <__swrite>,
  _ext = {
    _base = 0x1582c6a3eb8 <usualext+296> "",
    _size = 0
  },
  _up = 0x0,
  _ur = 0,
  _ubuf = "\000\000",
  _nbuf = "\n",
  _lb = {
    _base = 0x0,
    _size = 0
  },
  _blksize = 16384,
  _offset = 1833023
}

The interesting part is _flags. It is 6.

According to stdio.h, 6 = __SNBF | __SRD, so "unbuffered" and "OK to read".

the feof() call returns false, the python code interpretes it as an error.

When looking at fread(3) code in libc, I found that we doesn't set
__SEOF when the FILE is unbuffered.

src/lib/libc/stdio/fread.c
    72          if ((fp->_flags & __SNBF) != 0) {
    73                  /*
    74                   * We know if we're unbuffered that our buffer is 
empty, so
    75                   * we can just read directly. This is much faster than 
the
    76                   * loop below which will perform a series of one byte 
reads.
    77                   */
    78                  while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, 
resid)) > 0) {
    79                          p += r;
    80                          resid -= r;
    81                  }
    82                  FUNLOCKFILE(fp);
    83                  return ((total - resid) / size);
    84          }


I am able to reproduce it in plain C:

$ cat test.c

#include <err.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{
        FILE * fp = stdin;
        char buf[1024];
        size_t nread;
        size_t n = 1;

        if (setvbuf(fp, NULL, _IONBF, 0) != 0)
                err(EXIT_FAILURE, "setvbuf");

        for (;;) {
                nread = fread(buf, sizeof(char), n, fp);
                if (nread != n) {
                        if (feof(fp)) {
                                printf("EOF\n");
                                break;
                        }
                        
                        if (ferror(fp))
                                err(EXIT_FAILURE, "ferror\n");

                        errx(EXIT_FAILURE, "something else\n");
                }
                
        }

        return EXIT_SUCCESS;
}

$ cc -Wall test.c && ./a.out
^D
a.out: something else

Is it a bug to not set the __SEOF flag or it is expected for unbuffered
FILE ?

Thanks.
-- 
Sebastien Marie

Reply via email to