Hi,
I am currently investigating a problem related to mercurial:
https://bz.mercurial-scm.org/show_bug.cgi?id=6035
While running an update operation, it could ends unexpectly.
For its operation, it starts workers and communicate with them via
socket file waiting for EOF. It uses cPickle python extension for
serialize/deserialize messages.
It seems that when a worker exits after sending its last message,
because it has terminate the work, EOF detection doesn't work as
expected. The main process interpretes it as failure and abort. the
errno value is 35: EAGAIN.
The cPickle code seems correct:
pobj/Python-2.7.15/Python-2.7.15/Modules/cPickle.c
from read_file():
551 PyFile_IncUseCount((PyFileObject *)self->file);
552 Py_BEGIN_ALLOW_THREADS
553 nbytesread = fread(self->buf, sizeof(char), n, self->fp);
554 Py_END_ALLOW_THREADS
555 PyFile_DecUseCount((PyFileObject *)self->file);
556 if (nbytesread != (size_t)n) {
557 asm("int $3");
558 if (feof(self->fp)) {
559 PyErr_SetNone(PyExc_EOFError);
560 return -1;
561 }
562
563 PyErr_SetFromErrno(PyExc_IOError);
564 return -1;
565 }
566
(the asm() is mine).
it uses fread() and read 1 byte a time.
after the worker writes its last message, it is exiting. while looking
via ktrace, the exit(2) call is before main process finish to read the
buffer. it doesn't seems to be a problem.
the last call of fread(3) returns 0. as it asked for 1, the condition at
line 556 is valid and gdb catch the breakpoint.
(gdb) print *self->fp
$1 = {
_p = 0x1582c6a5250 <usual+272> "",
_r = 0,
_w = 0,
_flags = 6,
_file = 5,
_bf = {
_base = 0x1582c6a524f <usual+271> "\n",
_size = 1
},
_lbfsize = 0,
_cookie = 0x1582c6a51d8 <usual+152>,
_close = 0x1582c5f3a00 <__sclose>,
_read = 0x1582c5f38f0 <__sread>,
_seek = 0x1582c5f39a0 <__sseek>,
_write = 0x1582c5f3940 <__swrite>,
_ext = {
_base = 0x1582c6a3eb8 <usualext+296> "",
_size = 0
},
_up = 0x0,
_ur = 0,
_ubuf = "\000\000",
_nbuf = "\n",
_lb = {
_base = 0x0,
_size = 0
},
_blksize = 16384,
_offset = 1833023
}
The interesting part is _flags. It is 6.
According to stdio.h, 6 = __SNBF | __SRD, so "unbuffered" and "OK to read".
the feof() call returns false, the python code interpretes it as an error.
When looking at fread(3) code in libc, I found that we doesn't set
__SEOF when the FILE is unbuffered.
src/lib/libc/stdio/fread.c
72 if ((fp->_flags & __SNBF) != 0) {
73 /*
74 * We know if we're unbuffered that our buffer is
empty, so
75 * we can just read directly. This is much faster than
the
76 * loop below which will perform a series of one byte
reads.
77 */
78 while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p,
resid)) > 0) {
79 p += r;
80 resid -= r;
81 }
82 FUNLOCKFILE(fp);
83 return ((total - resid) / size);
84 }
I am able to reproduce it in plain C:
$ cat test.c
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
FILE * fp = stdin;
char buf[1024];
size_t nread;
size_t n = 1;
if (setvbuf(fp, NULL, _IONBF, 0) != 0)
err(EXIT_FAILURE, "setvbuf");
for (;;) {
nread = fread(buf, sizeof(char), n, fp);
if (nread != n) {
if (feof(fp)) {
printf("EOF\n");
break;
}
if (ferror(fp))
err(EXIT_FAILURE, "ferror\n");
errx(EXIT_FAILURE, "something else\n");
}
}
return EXIT_SUCCESS;
}
$ cc -Wall test.c && ./a.out
^D
a.out: something else
Is it a bug to not set the __SEOF flag or it is expected for unbuffered
FILE ?
Thanks.
--
Sebastien Marie