Kris, Mehul,
I think this patch (that is still a WIP) can cure your problems,
though you observe different problems in this rt_check().
I'd appreciate review and testing.
On Wed, May 02, 2007 at 02:24:54PM -0400, Kris Kennaway wrote:
K> One of my 7.0 systems has a flaky gateway, and when it goes down the
K> node often goes down with this panic:
K>
K> panic: mtx_lock() of destroyed mutex @ ../../../net/route.c:1306
K> cpuid = 0
K> KDB: enter: panic
K> [thread pid 28619 tid 100074 ]
K> Stopped at kdb_enter+0x68: ta %xcc, 1
K> db> wh
K> Tracing pid 28619 tid 100074 td 0xfffff800140e87e0
K> panic() at panic+0x248
K> _mtx_lock_flags() at _mtx_lock_flags+0x8c
K> rt_check() at rt_check+0x128
K> arpresolve() at arpresolve+0x98
K> ether_output() at ether_output+0x94
K> ip_output() at ip_output+0xc64
K> udp_output() at udp_output+0x680
K> udp_send() at udp_send+0x38
K> sosend_dgram() at sosend_dgram+0x3e0
K> sosend() at sosend+0x74
K> kern_sendit() at kern_sendit+0x14c
K> sendit() at sendit+0x1d4
K> sendto() at sendto+0x48
K> syscall() at syscall+0x2f8
K> -- syscall (133, FreeBSD ELF64, sendto) %o7=0x40aa68ac --
K>
K> I suspect locking is broken in an error case. net/route.c:1306 is in
K> the senderr() macro in rt_check():
K>
K> /* XXX BSD/OS checks dst->sa_family != AF_NS */
K> if (rt->rt_flags & RTF_GATEWAY) {
K> if (rt->rt_gwroute == NULL)
K> goto lookup;
K> rt = rt->rt_gwroute;
K> bewm --> RT_LOCK(rt); /* NB: gwroute */
K> if ((rt->rt_flags & RTF_UP) == 0) {
K> rtfree(rt); /* unlock gwroute */
K> rt = rt0;
K> Kris
On Mon, May 07, 2007 at 07:52:32AM -0700, Mehul Vora wrote:
M> Hi,
M>
M> Current implementation (Version 6.2) of rt_check() routine defined in
route.c is not completely MPSAFE. I found an issue when i started routing with
"directisr" enabled. For the first rcvd packet this function initializes
rt_gateway of the passed rt_entry. This is done by calling "rtalloc1" routine.
But "rt_check" function doesnt hold any lock while calling this function. So
incase if we have multiple instances of "ip_input - netisr" running than more
than one thread can call this routine which may lead to some corruption, in my
case it leads to a dead lock. Problem doesn't happen if before sending heavy
traffic a single packet of same kind is sent. But if initially itself heavy
traffic is sent than this happens immediately. I have fixed this and it works
well after it. Workaround patch for this issue is attached here with. Probably
we need to define a macro in route.h for the hardcoded values in the patch. Can
any one confirm this ?
M>
M> Thanks,
M> Mehul.
M>
M>
M> ---------------------------------
M> Sucker-punch spam with award-winning protection.
M> Try the free Yahoo! Mail Beta.
Content-Description: 206142780-rt_check.patch.txt
M> 1260a1261
M> > try_again:
M> 1280a1282,1289
M> >
M> > if(rt0->rt_flags & 0x80000000U){
M> > /*This rt is under process...*/
M> > RT_UNLOCK(rt);
M> > RT_UNLOCK(rt0);
M> > goto try_again;
M> > }
M> >
M> 1281a1291
M> > rt0->rt_flags |= 0x80000000U;
M> 1288a1299
M> > rt0->rt_flags &= (~0x80000000U);
M> _______________________________________________
M> [email protected] mailing list
M> http://lists.freebsd.org/mailman/listinfo/freebsd-net
M> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
--
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
Index: route.c
===================================================================
RCS file: /home/ncvs/src/sys/net/route.c,v
retrieving revision 1.119
diff -u -p -r1.119 route.c
--- route.c 22 May 2007 16:17:31 -0000 1.119
+++ route.c 23 May 2007 11:48:14 -0000
@@ -392,6 +392,14 @@ rtredirect(struct sockaddr *dst,
*/
rt_setgate(rt, rt_key(rt), gateway);
}
+
+ KASSERT(rt->rt_gateway != NULL,
+ ("RTF_GATEWAY and rt_gateway is NULL"));
+ /* Set up rt_gwroute. */
+ rt->rt_gwroute = rtalloc1(rt->rt_gateway, 1, 0UL);
+ KASSERT(rt != rt->rt_gwroute, ("Oops"));
+ if (rt->rt_gwroute != NULL)
+ RT_UNLOCK(rt->rt_gwroute);
} else
error = EHOSTUNREACH;
done:
@@ -1295,32 +1303,10 @@ rt_check(struct rtentry **lrt, struct rt
return (EHOSTUNREACH);
rt0 = rt;
}
- /* XXX BSD/OS checks dst->sa_family != AF_NS */
- if (rt->rt_flags & RTF_GATEWAY) {
- if (rt->rt_gwroute == NULL)
- goto lookup;
- rt = rt->rt_gwroute;
- RT_LOCK(rt); /* NB: gwroute */
- if ((rt->rt_flags & RTF_UP) == 0) {
- RTFREE_LOCKED(rt); /* unlock gwroute */
- rt = rt0;
- lookup:
- RT_UNLOCK(rt0);
- rt = rtalloc1(rt->rt_gateway, 1, 0UL);
- if (rt == rt0) {
- rt0->rt_gwroute = NULL;
- RT_REMREF(rt0);
- RT_UNLOCK(rt0);
- return (ENETUNREACH);
- }
- RT_LOCK(rt0);
- rt0->rt_gwroute = rt;
- if (rt == NULL) {
- RT_UNLOCK(rt0);
- return (EHOSTUNREACH);
- }
- }
- RT_UNLOCK(rt0);
+ if (rt->rt_flags & RTF_GATEWAY && (rt->rt_gwroute == NULL ||
+ (rt->rt_gwroute->rt_flags & RTF_UP) == 0)) {
+ RT_UNLOCK(rt);
+ return (EHOSTUNREACH);
}
/* XXX why are we inspecting rmx_expire? */
error = (rt->rt_flags & RTF_REJECT) &&
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"