OK, I tried some of the suggestion Glynn made below: > > I'm referring to the situation where read/write return a short count, > e.g. "write(fd, buf, count) < count". But, I've remembered that XDR > uses stdio rather than POSIX I/O, and I don't think that fread/fwrite > can return a short count (except for EOF). > > According to the MSVCRT documentation, the O_NOINHERIT flag should be > used when using _dup2(), e.g.: > > _pipe(p1, 250000, _O_BINARY|_O_NOINHERIT) > > Another suggestion: try changing the size passed to the _pipe() > function in dbmi_client/start.c. If that affects the tendency to > deadlock, it strongly suggests that the issue is related to the way > that a full pipe is handled. > > Beyond that, the only thing which I can suggest is to instrument the > XDR code with debug code to log all I/O operations (including the data > which is read/written). >
After hundreds of test runs with different Windows versions, these are my conclusions: The problem has to do with the pipe mechanism in Windows. I tried changing the pipe size as suggest, using extremely small (25) and extremely high (250000000) values. On Windows 2000, with the very small value, no module run makes it past 33 percent. So there is a clear correlation. As soon as I set it to some "sane" value (at least 25000), I get the same situation: ca. 4-6 out of 50 runs complete. Increasing the value from here won't make a difference, the differences are always within measuring precision. This is no surprise, since the comment in dbmi_client/start.c states that the pipe buffer value is not directly related to the pipe size. Apparently, Windows choose some fixed value as soon as the size is greated than some threshold. The same thing happens when I set the size to "0". However, the fact that I can block the piping effectively with very small values leaves me believing that this is, as Glynn suggests, the source of the troubles: A full pipe gets stuck and no process ever takes anything out of it to make some room, so the next bit of data cannot be pushed into it. Puller waits for pusher, pusher never pushes, because nothing gets pulled = deadlock. (I think...) Another thing makes me believe that Windows itself is the culprit here: I tested the same stuff on a Windows XP SP2 system, clean install from scratch. On this system, almost all the runs (97%) finished cleanly! Obviously MS did some improvements to process communication in that release ... Setting the _NO_INHERIT flag makes no difference. So, how are we going to go ahead? Best, Benjamin -- Benjamin Ducke, M.A. Archäoinformatik (Archaeoinformation Science) Institut für Ur- und Frühgeschichte (Inst. of Prehistoric and Historic Archaeology) Christian-Albrechts-Universität zu Kiel Johanna-Mestorf-Straße 2-6 D 24098 Kiel Germany Tel.: ++49 (0)431 880-3378 / -3379 Fax : ++49 (0)431 880-7300 www.uni-kiel.de/ufg _______________________________________________ grass-dev mailing list [email protected] http://grass.itc.it/mailman/listinfo/grass-dev

