Derek Tattersall wrote:
> I see several instances of this in /var/log/messages after cvsup'ing
> Monday evening and rebuilding world and kernel.  I haven't seen any
> messages about this, so I figured I'd ask here.
> 
> Message:
> Mar 11 17:33:30 lorne kernel: malloc() of "64" with the following
> non-sleepablelocks held:
> Mar 11 17:33:30 lorne kernel: exclusive sleep mutex netisr lock r = 0
> (0xc0579160) locked @ /usr/src/sys/net/netisr.c:215
> 
> Can anybody supply me a clue as to what's going on here?

Problem with Jonathan Lemon's commit to update the NETISR code
to each have it's own queue.

The problem appears to be that the ni->ni_handler code os called
with the "mtx_lock(&netisr_mtx);" mutex in effect, which is actually
only supposed to protect the netisrs list from deregistration or
reregistration while in the traversal loop.

Probably the most correct thing to so is probe, drop, call,
reacquire, reprobe, and restart on non-matching value (it
looks to me as if the mutex is only protecting the netisrs[]
array,y way of the NULL-ness of ni_handler,  rather than the
elements in the queue ni_queue).

It seems to me that the order of operation in netisr_register()
and netisr_unregister(), and the lack of mutex protection there,
too, is broken.

My suggestion would be to sysctl the net.isr.enable on, to enable
direct dispatch as a workaround.  I like that code path better for
livelock avaoidance anyway.

Otherwise, here's a patch that I think makes things work as they
are supposed to work (but the handler dispatch loop is probably
still wrong, given that it doesn't back off the mutex acquisition
and retry, and happens at clock interrupt, so a blocking mutex
acquisition is probably a bad thing there).

-- Terry
*** netisr.c.old        Tue Mar 11 12:12:01 2003
--- netisr.c    Tue Mar 11 12:24:32 2003
***************
*** 75,82 ****
        
        KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
            ("bad isr %d", num));
!       netisrs[num].ni_handler = handler;
        netisrs[num].ni_queue = inq;
  }
  
  void
--- 75,86 ----
        
        KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
            ("bad isr %d", num));
!       mtx_lock(&netisr_mtx);
!       KASSERT((netisrs[num].ni_handler == NULL)),
!           ("isr already registered %d", num));
        netisrs[num].ni_queue = inq;
+       netisrs[num].ni_handler = handler;
+       mtx_unlock(&netisr_mtx);
  }
  
  void
***************
*** 87,99 ****
        
        KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
            ("bad isr %d", num));
        ni = &netisrs[num];
!       ni->ni_handler = NULL;
!       if (ni->ni_queue != NULL) {
                s = splimp();
                IF_DRAIN(ni->ni_queue);
                splx(s);
        }
  }
  
  struct isrstat {
--- 91,108 ----
        
        KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
            ("bad isr %d", num));
+       mtx_lock(&netisr_mtx);
        ni = &netisrs[num];
!       /* repeat drain of queue until queue is empty and mutex is held */
!       while (ni->ni_queue != NULL) {
!               mtx_unlock(&netisr_mtx);
                s = splimp();
                IF_DRAIN(ni->ni_queue);
                splx(s);
+               mtx_lock(&netisr_mtx);
        }
+       ni->ni_handler = NULL;
+       mtx_unlock(&netisr_mtx);
  }
  
  struct isrstat {
***************
*** 199,204 ****
--- 208,214 ----
        struct netisr *ni;
        struct mbuf *m;
        u_int bits;
+       u_int obits;
        int i;
  #ifdef DEVICE_POLLING
        const int polling = 1;
***************
*** 207,212 ****
--- 217,223 ----
  #endif
  
        mtx_lock(&netisr_mtx);
+ restart:
        do {
                bits = atomic_readandclear_int(&netisr);
                if (bits == 0)
***************
*** 220,225 ****
--- 231,238 ----
                                printf("swi_net: unregistered isr %d.\n", i);
                                continue;
                        }
+                       obits = bits;
+                       mtx_unlock(&netisr_mtx);
                        if (ni->ni_queue == NULL)
                                ni->ni_handler(NULL);
                        else
***************
*** 229,234 ****
--- 242,250 ----
                                                break;
                                        ni->ni_handler(m);
                                }
+                       mtx_lock(&netisr_mtx);
+                       if (obits != atomic_readandclear_int(&netisr))
+                               goto restart;
                }
        } while (polling);
        mtx_unlock(&netisr_mtx);

Reply via email to