On Mon, May 21, 2012 at 11:20 PM, Rafael Zalamena <rzalam...@gmail.com> wrote: > On Mon, May 21, 2012 at 11:05 PM, Rafael Zalamena <rzalam...@gmail.com> wrote: >> On Mon, May 21, 2012 at 5:16 PM, Claudio Jeker <cje...@diehard.n-r-g.com> wrote: >>> On Thu, May 10, 2012 at 08:19:58PM -0300, Rafael Zalamena wrote: >>>> ... >>> The ifp passed to ifaof_ifpforaddr() is NULL. How that can happen is >>> unclear to me, it seems like the found ifa is not valid anymore. >>> Is this crash easy to trigger? Can I get you're hostname.* files, >>> ospfd.conf and ldpd.conf for all three boxes? >>> >> ... >> >> >> ALIX1: >> ==> /etc/hostname.lo1 >> 10.0.10.1/32 >> ==> /etc/hostname.mpe0 >> mplslabel 666 >> 192.168.1.200/32 >> ==> /etc/hostname.vr0 >> 192.168.1.200/24 >> !route add default 192.168.1.254 >> ==> /etc/hostname.vr1 >> 10.0.1.1/24 mpls >> ==> /etc/hostname.vr2 >> 10.0.2.1/24 mpls >> ==> /etc/ospfd.conf >> router-id 10.0.10.1 >> >> area 0.0.0.0 { >> interface vr0 >> interface vr1 >> interface vr2 >> interface lo1 >> } >> ==> /etc/ldpd.conf >> router-id 10.0.10.1 >> >> interface vr1 >> interface vr2 >> >> >> The setup topology is: http://dl.dropbox.com/u/222135/partial.png >> For more information about the setup, please see the "MPLS Setup" thread I made. >> >> Steps to reproduce: >> 1 - Configure ALIX1 interfaces, ospf, ldpd >> 2 - Start interfaces and then daemons (ospf first) >> 3 - Repeate for 2 and 3. >> 4 - While repeating the process for ALIX3 it panics. >> >> ALIX 3 crashed while starting LDPd with the others running (maybe its >> a event storm thing?). I might have forgotten something, but once >> everything is placed it doesn't happen anymore, so we can try to >> reproduce it by reconfiguring one of the hosts while the others one >> are working. >> >> ... > > OK, after just a little bit of thinkering I've got something. > > After booting up ALIX1, I played some commands and here is what I've got. > > # ifconfig vr0 alias delete > # pkill ldpd > # ldpd -dv & > [1] 1730 > # startup > ]accept_add: acceuvm_fault(0xd54eb880, 0x0, 0, 1) -> e > pting on fd 11 > kaccept_add: acceepting on fd 9 > irf_act_start: intnerface vr2 link edown > if_fsm: evlent UP resulted :in action START and changing stapte for > interfacea vr2 from DOWN tgo ACTIVE > if_fsme: event UP resul ted in action STfART and changinga state for > interuface vr1 from DOlWN to ACTIVE > ketrnel add route 0 .0.0.0/0 > kernelt add route 10.0.r1.0/24 > kernel aadd route 10.0.1.p0/24 > kernel add, route 10.0.2.0/ 24 > kernel add rcoute 10.0.3.0/24o > eernel add roudte 10.0.10.1/32 > kernel add rout=e 10.0.10.2/32 > 0kernel add route > 10.0.10.3/32 > Stopped at ifaof_ifpforaddr+0x26: movl 0x14(%edx),%edx > ddb> ps > PID PPID PGRP UID S FLAGS WAIT COMMAND > 6095 1730 1730 98 3 0x80 kqread ldpd > 2761 1730 1730 98 3 0x80 kqread ldpd > * 1730 11124 1730 0 7 0 ldpd > 9946 26755 26755 0 3 0x88 pause sendmail > 26755 4320 26755 0 3 0x80 select sendmail > 11124 1 11124 0 3 0x80 ttyin ksh > 18447 1 18447 0 3 0x80 select cron > 26945 1 26945 99 3 0x80 poll sndiod > 13366 1 13366 0 3 0x80 select inetd > 4320 1 4933 0 3 0x88 pause sendmail > 29378 13835 13835 85 3 0x80 kqread ospfd > 2733 13835 13835 85 3 0x80 kqread ospfd > 13835 1 13835 0 3 0x80 kqread ospfd > 24601 1 24601 0 3 0x80 select sshd > 21983 2988 2988 74 3 0x80 bpf pflogd > 2988 1 2988 0 3 0x80 netio pflogd > 18275 13196 13196 73 2 0x80 syslogd > 13196 1 13196 0 3 0x80 netio syslogd > 7697 1 7697 0 3 0x80 mfsidl mount_mfs > 21567 1 21567 0 3 0x80 mfsidl mount_mfs > 23010 1 23010 0 3 0x80 mfsidl mount_mfs > 13 0 0 0 3 0x100200 aiodoned aiodoned > 12 0 0 0 3 0x100200 syncer update > 11 0 0 0 3 0x100200 cleaner cleaner > 10 0 0 0 3 0x100200 reaper reaper > 9 0 0 0 3 0x100200 pgdaemon pagedaemon > 8 0 0 0 3 0x100200 bored crypto > 7 0 0 0 3 0x100200 pftm pfpurge > 6 0 0 0 3 0x100200 usbtsk usbtask > 5 0 0 0 3 0x100200 usbatsk usbatsk > 4 0 0 0 3 0x100200 bored syswq > 3 0 0 0 3 0x40100200 idle0 > 2 0 0 0 3 0x100200 kmalloc kmthread > 1 0 1 0 3 0x80 wait init > 0 -1 0 0 3 0x200 scheduler swapper > ddb> trace > ifaof_ifpforaddr(d11effd8,0,0,d0519707,d11ef000) at ifaof_ifpforaddr+0x26 > ifa_ifwithroute(140003,d11effd8,d11effe8,0,f37bec00) at ifa_ifwithroute+0x61 > rt_getifa(f37becfc,0,f37bec8c,d03dacfc,40) at rt_getifa+0xe2 > rtrequest1(1,f37becfc,8,f37bed54,0) at rtrequest1+0x5f7 > route_output(d5508700,d523c444,d5508700,0,0) at route_output+0xe38 > route_usrreq(d523c444,9,d5508700,0,0) at route_usrreq+0x65 > sosend(d523c444,0,f37beec0,d5508700,0) at sosend+0x476 > soo_write(d52371bc,d52371d8,f37beec0,d54fa5a0,cfcf0014) at soo_write+0x3b > dofilewritev(d526f45c,4,d52371bc,cfbcfbc0,3) at dofilewritev+0x131 > sys_writev(d526f45c,f37bef64,f37bef84,d057b7da,d526f45c) at sys_writev+0x7c > syscall() at syscall+0x26a > --- syscall (number 0) --- > 0x2: > ddb>
Cleaned up the quotes from the e-mail to keep only whats necessary. I've made a diff that solves the panic, but does not solve the main problem. I investigated the problem with the time I had and I noticed that it happened because I had routes referencing vr0 at the moment I didn't have an alias configured for that interface. The diff below avoids the panic by not letting the route address get back to the interface that it belonged. However while it fix the panics, it also causes LDPd to show an error treatment message telling that something is wrong with the routes left pointing to vr0. Index: sys/net/route.c =================================================================== RCS file: /cvs/src/sys/net/route.c,v retrieving revision 1.136 diff -u -p -r1.136 route.c --- sys/net/route.c 9 May 2012 06:50:55 -0000 1.136 +++ sys/net/route.c 23 May 2012 12:12:01 -0000 @@ -646,6 +646,9 @@ ifa_ifwithroute(int flags, struct sockad if ((ifa = rt->rt_ifa) == NULL) return (NULL); } + /* Don't search interfaces address if there is no pointer back */ + if (ifa->ifa_ifp == NULL) + return (NULL); if (ifa->ifa_addr->sa_family != dst->sa_family) { struct ifaddr *oifa = ifa; ifa = ifaof_ifpforaddr(dst, ifa->ifa_ifp);