On 02/05/20(Sat) 16:02, Mark Kettenis wrote:
> > Date: Sat, 2 May 2020 11:33:17 +0200
> > From: Martin Pieuchot <[email protected]>
> > [...]
> > Do we see that the issue is caused by the order in which descriptors are
> > closed in fdfree()? The current deadlock occurs because the duped master
> > has a higher fd number than the slave which means it is still open when the
> > slave is closed.
>
> I'm sure we could construct an example where the file descriptors are
> in a different oder. So changing the order is not going to help.
Obviously :)
> > But why would that be a problem? By default *close() functions,
> > including ttylclose() are blocking. So any exiting process might end up
> > hanging in fdfree(). Diff below illustrates that by forcing all *close()
> > during exit1() to be non-blocking, it also fix the issue.
>
> I very much fear that is going to have unintended side-effects with
> output not being flushed properly. And the process could still
> deadlock itself by using close(2) directly isn't it?
Indeed.
> > Does it make sense to close fds as non-blocking when existing? What
> > should a dying thread wait for? What can be the cons of such approach?
> >
> > Now regarding your fix, why does it make sense to wait 5sec instead of
> > indefinitely? Did you look at r1.263 of NetBSD's kern/tty.c? If we go
> > with this change could you please change the 'timo' suffix and variables
> > to 'nsec' and use uint64_t instead of int?
>
> r1.263 was reverted in r1.264. Then r1.265 is the commit quoted by anton@.
> There is also r2.267 which adds an additional fix to r1.265.
>
> In ttywait(), NetBSD only calls ttyflush() if there is a timeout. That
> makes sense, because we have ttywflush() to combine the wait and flush
> so ttywait() shouldn't flush when there is no error.
Updated diff below reflecting those changes. I'm still questioning the
5sec timeout, but it is without doubt an improvement over the current
behavior.
The previously mentioned test as well as a modified version closing the
slave before exit(2) now hang for 5 seconds instead of deadlocking
indefinitely.
I believe we want that for release, ok?
Index: kern/tty.c
===================================================================
RCS file: /cvs/src/sys/kern/tty.c,v
retrieving revision 1.154
diff -u -p -r1.154 tty.c
--- kern/tty.c 7 Apr 2020 13:27:51 -0000 1.154
+++ kern/tty.c 6 May 2020 07:44:53 -0000
@@ -80,6 +80,8 @@ void filt_ttyrdetach(struct knote *kn);
int filt_ttywrite(struct knote *kn, long hint);
void filt_ttywdetach(struct knote *kn);
void ttystats_init(struct itty **, size_t *);
+int ttywait_nsec(struct tty *tp, uint64_t nsecs);
+int ttysleep_nsec(struct tty *, void *, int, char *, uint64_t);
/* Symbolic sleep message strings. */
char ttclos[] = "ttycls";
@@ -1202,10 +1204,10 @@ ttnread(struct tty *tp)
}
/*
- * Wait for output to drain.
+ * Wait for output to drain, or if this times out, flush it.
*/
int
-ttywait(struct tty *tp)
+ttywait_nsec(struct tty *tp, uint64_t nsecs)
{
int error, s;
@@ -1219,7 +1221,10 @@ ttywait(struct tty *tp)
(ISSET(tp->t_state, TS_CARR_ON) || ISSET(tp->t_cflag,
CLOCAL))
&& tp->t_oproc) {
SET(tp->t_state, TS_ASLEEP);
- error = ttysleep(tp, &tp->t_outq, TTOPRI | PCATCH,
ttyout);
+ error = ttysleep_nsec(tp, &tp->t_outq, TTOPRI | PCATCH,
+ ttyout, nsecs);
+ if (error == EWOULDBLOCK)
+ ttyflush(tp, FWRITE);
if (error)
break;
} else
@@ -1229,6 +1234,12 @@ ttywait(struct tty *tp)
return (error);
}
+int
+ttywait(struct tty *tp)
+{
+ return (ttywait_nsec(tp, INFSLP));
+}
+
/*
* Flush if successfully wait.
*/
@@ -1237,7 +1248,8 @@ ttywflush(struct tty *tp)
{
int error;
- if ((error = ttywait(tp)) == 0)
+ error = ttywait_nsec(tp, SEC_TO_NSEC(5));
+ if (error == 0 || error == EWOULDBLOCK)
ttyflush(tp, FREAD);
return (error);
}
@@ -2281,11 +2293,18 @@ tputchar(int c, struct tty *tp)
int
ttysleep(struct tty *tp, void *chan, int pri, char *wmesg)
{
+
+ return (ttysleep_nsec(tp, chan, pri, wmesg, INFSLP));
+}
+
+int
+ttysleep_nsec(struct tty *tp, void *chan, int pri, char *wmesg, uint64_t nsecs)
+{
int error;
short gen;
gen = tp->t_gen;
- if ((error = tsleep_nsec(chan, pri, wmesg, INFSLP)) != 0)
+ if ((error = tsleep_nsec(chan, pri, wmesg, nsecs)) != 0)
return (error);
return (tp->t_gen == gen ? 0 : ERESTART);
}