Simon Marlow wrote:

> > > > Unfortunately, select() (and hence the GHC RTS) doesn't
> > > > identify the bad
> > > > descriptor(s).  Here's where I suspect my program may be
> > > > going awry.  The
> > > > main process creates a pipe.  The process then forks.  The
> > > > parent closes
> > > > the pipe's read descriptor immediately.  The child soon goes
> > > > to read from
> > > > the pipe, using threadWaitRead followed by fdRead.  The
> > child process
> > > > suffers the select failure shown above.
> > >
> > > So.. I take it the child shouldn't really be reading from a
> > closed file
> > > descriptor?
> >
> > The file descriptor is the read end of a pipe used to send
> > data from the
> > parent to the child.  The parent closes it because it will
> > never use it,
> > but only after the parent forks.  So the child's copy of the file
> > descriptor should still be open, n'est-ce pas?
>
> Yes, seems reasonable to me.  Are there any other file descriptors that
> you are closing?  Are you doing any lazy I/O?

"Yes" and "yes".  Your questions suggest a new hypothesis which I'll mention
here for your thoughts and, in parallel, test out on my program.  Suppose
the parent process has a thread blocked on `threadWaitRead` at the time it
forks.  In the new process, the file descriptor on which that thread is
waiting is closed.  The next select() call fails because one of the
designated file descriptors is now invalid.

If the above hypothesis is true, then my schemes for cleaning up unwanted
file descriptors and threads in the new process are not playing together
well enough.  To clean up unwanted file descriptors, I keep track of open
file descriptors in the parent process and close unwanted ones in the child
process.  To clean up unwanted threads, I have a thread obtain and hold a
process-wide lock while performing a side-effecting operation; in a forked
child process, the lock tells preexisting threads to commit suicide instead
of performing their side-effecting operation.

When a thread wants to read from a file descriptor, its logic looks like:

        threadWaitRead (fdToInt fd)
        ([char], 1) <- locked (fdRead fd 1)

where `locked` obtains and holds the aforementioned lock for the duration of
its argument action.

Reflecting on the above, I now realize that the recent change
(/fptools/ghc/rts/Select.c?rev=1.22 in GHC 5.04) to wake up all threads when
select() returns an EBADF error, though well-intentioned, is inappropriate.
The point of `threadWaitRead` and `threadWaitWrite` is to block the calling
thread until it's known that a subsequent call involving the given file
descriptor will not block.  Allowing all threads to continue--even those
whose file descriptor is not yet ready--allows for exactly the deadlock that
`threadWaitRead` and `threadWaitWrite` are designed to avoid.

I consider it a bug in select() that, when EBADF is reported, the sets of
"ready" file descriptors are not also reported.  Fortunately, I think
select() can be "fixed" (albeit clumsily) in the GHC RTS.  When EBADF is
reported, cycle through all file descriptors that were presented to
select(), invoking select() with each single file descriptor (and with zero
timeout).  At least one of these calls should report EBADF; wake up the
threads corresponding to all such calls.  (It's probably worth waking up
threads corresponding to normal returns from select(), too, while you're at
it.)

Using select() as just described would regain for `threadWaitRead` and
`threadWaitWrite` their necessary semantic guarantee.  I think it would also
solve my problem.  In the meantime, I will investigate killing unwanted
threads with `killThread` in the child process, before closing unwanted file
descriptors, which should avoid occurrences of EBADF from select().

It sure would simplify my program if I could fork a process and not have
auxiliary threads persist in the child.  Could this option be provided by
GHC RTS in a semantically sound way?

Thanks for asking the right questions!

Dean

_______________________________________________
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to