On 05/04/06, tony sarendal <[EMAIL PROTECTED]> wrote:
>
>
>
> On 05/04/06, Claudio Jeker <[EMAIL PROTECTED]> wrote:
> >
> > On Wed, Apr 05, 2006 at 01:19:44PM +0100, tony sarendal wrote:
> > > On 05/04/06, Claudio Jeker < [EMAIL PROTECTED]> wrote:
> > > >
> > > > On Wed, Apr 05, 2006 at 12:30:56PM +0100, tony sarendal wrote:
> > > > > On 05/04/06, tony sarendal <[EMAIL PROTECTED] > wrote:
> > > >
> > > > ...
> > > >
> > > > > > On a side note, at this stage I did:
> > > > > >
> > > > > > cr211-FRA# bgpctl reload
> > > > > > reload request sent.
> > > > > > cr211-FRA#
> > > > > >
> > > > > > on the neighbor cr212-FRA I get this:
> > > > > >
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[2618]: neighbor 172.16.1.21:
> > received
> > > > > > notification: Cease, unknown subcode 0
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[2618]: neighbor 172.16.1.21:
> > state
> > > > change
> > > > > > Established -> Idle, reason: NOTIFICATION received
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[16469]: neighbor
10.1.1.29(AS65000)
> > > > > > withdraw 10.0.0.6/32
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[16469]: neighbor
10.1.1.29(AS65000)
> > > > > > withdraw 10.1.1.20/30
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[16469]: fatal in RDE: attr_diff:
> > equal
> > > > > > attributes encountered
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[3196]: Lost child: route decision
> >
> > > > engine
> > > > > > exited
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[2618]: fatal in SE:
> > > > session_dispatch_imsg:
> > > > > > pipe closed: Connection refused
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[3196]: kernel routing table
> > decoupled
> > > > > > Apr 5 13:13:39 cr212-FRA bgpd[3196]: Terminating
> > > > > >
> > > > > Doh, sent to early, rde in ar213-FRA also shutdown when doing the
> > bgpctl
> > > > > reload:
> > > > >
> > > > > Apr 5 12:52:46 ar213-FRA bgpd[4507]: neighbor 172.16.1.18: state
> > change
> > > > > OpenConfirm -> Established, reason: KEEPALIVE message received
> > > > > Apr 5 12:52:46 ar213-FRA bgpd[5938]: nexthop 172.16.1.18 now
> > valid:
> > > > > directly connected
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[19227]: fatal in RDE: attr_diff:
> > equal
> > > > > attributes encountered
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[5938]: Lost child: route decision
> > engine
> > > > > exited
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[4507]: neighbor 192.168.30.10 :
> > state
> > > > change
> > > > > Established -> Idle, reason: Stop
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[4507]: neighbor 172.16.1.22: state
> > change
> > > > > Established -> Idle, reason: Stop
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[4507]: neighbor 172.16.1.18: state
> > change
> > > > > Established -> Idle, reason: Stop
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[5938]: kernel routing table
> > decoupled
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[4507]: session engine exiting
> > > > > Apr 5 13:14:19 ar213-FRA bgpd[5938]: Terminating
> > > > >
> > > > > Time for a break.
> > > > >
> > > >
> > > > The following diff kills the fatalx() and keeps you running. THIS IS
> > A
> > > > HACK!
> > > > The real problem is in rde_reflector() -- it modifies attributes
> > > > that are referenced in the cache and so you end up in a major
> > fuckup.
> > > > The function needs some rework similar to the way communities are
> > handled.
> > > >
> > > > I'll have a diff later today or early tomorrow.
> > >
> > >
> > > No worries Claudio, take your time.
> > > I prefer to have something that you consider good instead of a hack,
> > > I'm still in the lab slowly stepping my way to where I want to go with
> > this.
> > >
> >
> > Your lucky day. Had to reschedule some work and hand a few mins to
> > create
> > the following diff. Compiles but untested -- my lab is missing a
> > route-reflector :( -- but the change is obvious.
> >
> > This should fix the "attr_diff: equal attributes encountered" fatal.
> > --
> > :wq Claudio
> >
> > ? obj
> > Index: rde.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
> > retrieving revision 1.203
> > diff -u -p -r1.203 rde.c
> > --- rde.c 5 Apr 2006 13:24:28 -0000 1.203
> > +++ rde.c 5 Apr 2006 13:25:19 -0000
> > @@ -1431,6 +1431,7 @@ int
> > rde_reflector(struct rde_peer *peer, struct rde_aspath *asp)
> > {
> > struct attr *a;
> > + u_int8_t *p;
> > u_int16_t len;
> > u_int32_t id;
> >
> > @@ -1459,15 +1460,19 @@ rde_reflector(struct rde_peer *peer, str
> > sizeof(conf->clusterid)) == 0)
> > return (0);
> >
> > - /* prepend own clusterid */
> > - if ((a->data = realloc(a->data, a->len +
> > - sizeof(conf->clusterid))) == NULL)
> > + /* prepend own clusterid by replacing attribute
> > */
> > + len = a->len + sizeof(conf->clusterid);
> > + if (len < a->len)
> > + fatalx("rde_reflector: cluster-list
> > overflow");
> > + if ((p = malloc(len)) == NULL)
> > fatal("rde_reflector");
> > - memmove(a->data + sizeof(conf->clusterid),
> > - a->data, a->len);
> > - a->len += sizeof(conf->clusterid);
> > - memcpy(a->data, &conf->clusterid,
> > - sizeof(conf->clusterid));
> > + memcpy(p, &conf->clusterid,
> > sizeof(conf->clusterid));
> > + memcpy(p + sizeof(conf->clusterid), a->data,
> > a->len);
> > + attr_free(asp, a);
> > + if (attr_optadd(asp, ATTR_OPTIONAL,
> > ATTR_CLUSTER_LIST,
> > + p, len) == -1)
> > + fatalx("attr_optadd failed but
> > impossible");
> > + free(p);
> > } else if (attr_optadd(asp, ATTR_OPTIONAL,
> > ATTR_CLUSTER_LIST,
> > &conf->clusterid, sizeof(conf->clusterid)) == -1)
> > fatalx("attr_optadd failed but impossible");
> > Index: rde.h
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/bgpd/rde.h,v
> > retrieving revision 1.92
> > diff -u -p -r1.92 rde.h
> > --- rde.h 5 Apr 2006 13:24:29 -0000 1.92
> > +++ rde.h 5 Apr 2006 13:25:20 -0000
> > @@ -273,6 +273,7 @@ struct attr *attr_optget(const struct rd
> > void attr_copy(struct rde_aspath *, struct rde_aspath *);
> > int attr_compare(struct rde_aspath *, struct rde_aspath *);
> > void attr_freeall(struct rde_aspath *);
> > +void attr_free(struct rde_aspath *, struct attr *);
> > #define attr_optlen(x) \
> > ((x)->len > 255 ? (x)->len + 4 : (x)->len + 3)
> >
> > Index: rde_attr.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/bgpd/rde_attr.c,v
> > retrieving revision 1.64
> > diff -u -p -r1.64 rde_attr.c
> > --- rde_attr.c 15 Mar 2006 11:26:45 -0000 1.64
> > +++ rde_attr.c 5 Apr 2006 13:25:20 -0000
> > @@ -29,8 +29,6 @@
> > #include "bgpd.h"
> > #include " rde.h"
> >
> > -void attr_free(struct rde_aspath *, struct attr *);
> > -
> > int
> > attr_write(void *p, u_int16_t p_len, u_int8_t flags, u_int8_t type,
> > void *data, u_int16_t data_len)
> >
> >
>
> Looking good, it passed the test that always caused rde to exit.
> I'll bring up the rest of the network and flap/reload a bit.
>
>
>
I have tested a bit more, every patch so far is looking good in the tests.
The test network (8 core routers, 3 access routers) re-routes really fast
on events which don't use the hold-down timer.
Running mtr on a 4 hop path with one second intervall has not shown any
packet loss when I do "shutdown -r" on one of the core routers in the path.
This in a bgp-only network where a router only peers with it's connected
neighbors.
--
Tony Sarendal - [EMAIL PROTECTED]
IP/Unix
-= The scorpion replied,
"I couldn't help it, it's my nature" =-