On 3/2/2023 5:39 PM, Ken Brown wrote:
I'm returning to this thread because a surprising thing happened. I
decided to try to debug the fifo problem I reported at the beginning of
the thread (a hang building TeX Live on Cygwin when the jobserver used a
fifo). So I installed make 4.4.1 built with fifos enabled (by setting
CPPFLAGS=-DJOBSERVER_USE_FIFO=1). And now I can no longer reproduce the
hang.
Update: The hang occurred again. It appears to be caused by an infinite
loop starting with a call to pselect[*]. I looked briefly at the code
that calls pselect, and I suspect that there is a timing issue. Perhaps
certain operations that are supposed to be atomic on Posix platforms are
not atomic on Cygwin. (Unfortunately, Cygwin's fifo implementation is
extremely complicated in order to support multiple readers and writers,
and atomicity had to be sacrificed.)
If I'm right, the solution would seem to be to disable the use of
pselect on Cygwin when the jobserver is using a fifo. I'll try that on
a local build of make and see if I can still reproduce the problem. It
might be several weeks until I'm confident, since the hang occurs only
sporadically and only after about 90 minutes of running the TeX Live build.
Ken
[*] I say "appears to be" because I was running an optimized build of
make and an optimized build of the Cygwin DLL, so the gdb backtrace
might not be reliable.