On 18.04.2018 11:08, Thomas Klausner wrote:
> Hi!
> 
> I've recently updated to a NetBSD built on April 3rd. In my latest bulk 
> builds I noticed 
> 
> /netbsd: file: table is full - increase kern.maxfiles or MAXFILES
> 
> It was around 3700, I've bumped it to 8000.
> 
> I wonder why I needed to do that though. Did something start using
> more file descriptors, or is something leaking file descriptors?
> 
> Did anyone else notice something similar?
>  Thomas
> 

Recently, I've started observing the same warning in dmesg(8) and a
related (?) panic(9) for pipe_write().

There is triggered KASSERT(9) in pipelock(). We take kernel mutex in
pipe_write()

    833 static int
    834 pipe_write(file_t *fp, off_t *offset, struct uio *uio,
kauth_cred_t cred,
    835     int flags)
    836 {
    837         struct pipe *wpipe, *rpipe;
    838         struct pipebuf *bp;
    839         kmutex_t *lock;
    840         int error;
    841         unsigned int wakeup_state = 0;
    842
    843         /* We want to write to our peer */
    844         rpipe = fp->f_pipe;
    845         lock = rpipe->pipe_lock;
    846         error = 0;
    847
    848         mutex_enter(lock); // <-- take mutex
    849         wpipe = rpipe->pipe_peer;
    850
    851         /*
    852          * Detect loss of pipe read side, issue SIGPIPE if lost.
    853          */
    854         if (wpipe == NULL || (wpipe->pipe_state & PIPE_EOF) != 0) {
    855                 mutex_exit(lock);
    856                 return EPIPE;
    857         }
    858         ++wpipe->pipe_busy;
    859
    860         /* Aquire the long-term pipe lock */
    861         if ((error = pipelock(wpipe, true)) != 0) { // <-- enter here
    862                 --wpipe->pipe_busy;
    863                 if (wpipe->pipe_busy == 0) {
    864                         wpipe->pipe_state &= ~PIPE_RESTART;
    865                         cv_broadcast(&wpipe->pipe_draincv);
    866                 }
    867                 mutex_exit(lock);
    868                 return (error);
    869         }


    371 static int
    372 pipelock(struct pipe *pipe, bool catch_p)
    373 {
    374         int error;
    375
    376         KASSERT(mutex_owned(pipe->pipe_lock)); // <-- panic, owner=0
    377
    378         while (pipe->pipe_state & PIPE_LOCKFL) {
    379                 pipe->pipe_state |= PIPE_LWANT;
    380                 if (catch_p) {
    381                         error = cv_wait_sig(&pipe->pipe_lkcv, 
pipe->pipe_lock);
    382                         if (error != 0)
    383                                 return error;
    384                 } else
    385                         cv_wait(&pipe->pipe_lkcv, pipe->pipe_lock);
    386         }
    387
    388         pipe->pipe_state |= PIPE_LOCKFL;
    389
    390         return 0;
    391 }

It's quite odd because it's new and I was using on this machine
userland, packages and kernel from November 2017. And there were never
any similar problems observed.

After upgrade src/ and pkgsrc/ to HEAD on this machine I keep observing
the same panic. Sometimes like 5 times a day.

I cannot reproduce it on demand.. sometimes it's quickly after start of
the desktop, otherwise it's after few hours.

Kernel dumpers doesn't work for this failure and I keep slowly observing
this issue adding debug here and there.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to