Re: relayd patch - delayed failover

2015-12-04 Thread Brian S. Vangsgaard

Hi Sebastian

You commited the wrong patch.

Please see http://marc.info/?l=openbsd-tech=144378086813524=2

The patch below, results in a relayd panic if more than one host is
available in the group.

Sebastian Benoit skrev den 2015-12-03 17:43:

thanks, commited

Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200:

Hi,

Problem:
If a client have a state entry in the relayd anchor, and the target
server goes down, the client will be unable to "failover" for 10 sec +
(10 sec - elapsed time since last SLA check).

There are two issues here, this patch only fix the problem about 
delayed

(10 seconds) failover.

When the host fails the SLA check, it will be marked as being down.
However it will not be removed from the achor before the next SLA 
check.


Reproduce:
Start relayd with -dvvv, let it run for 10-20 seconds, then make a 
host

fail its SLA check. Relayd will mark the host as being down when it
reach next SLA check, but the sync_table() will not be called until 10
sec. later (at the next SLA check).

Solution:
The logic is already in the code, but right now it only handle the
statistics and set the host as being down.

Call sync_table() when a host goes from UP to DOWN.


Index: pfe.c
===
RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
retrieving revision 1.79.2.1
diff -u -p -u -p -r1.79.2.1 pfe.c
--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
+++ pfe.c   1 Oct 2015 10:48:59 -
@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
table->conf.flags |= F_CHANGED;
host->flags |= F_DEL;
host->flags &= ~(F_ADD);
+   pfe_sync();
}

host->up = st.up;


If you need more details or want to fix the scheduler issue, please
contact me :)


--
bsv





Re: relayd patch - delayed failover

2015-12-04 Thread Sebastian Benoit
Brian S. Vangsgaard(b...@avalanic.dk) on 2015.12.04 09:04:19 +0100:
> Hi Sebastian
> 
> You commited the wrong patch.
> 
> Please see http://marc.info/?l=openbsd-tech=144378086813524=2
> 
> The patch below, results in a relayd panic if more than one host is
> available in the group.

i believe i committed the correct one, i just replied to the wrong mail here
on the list. Here is what i put in:

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/relayd/pfe.c.diff?r1=1.82=1.83=date

/Benno

> Sebastian Benoit skrev den 2015-12-03 17:43:
> >thanks, commited
> >
> >Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200:
> >>Hi,
> >>
> >>Problem:
> >>If a client have a state entry in the relayd anchor, and the target
> >>server goes down, the client will be unable to "failover" for 10 sec +
> >>(10 sec - elapsed time since last SLA check).
> >>
> >>There are two issues here, this patch only fix the problem about 
> >>delayed
> >>(10 seconds) failover.
> >>
> >>When the host fails the SLA check, it will be marked as being down.
> >>However it will not be removed from the achor before the next SLA 
> >>check.
> >>
> >>Reproduce:
> >>Start relayd with -dvvv, let it run for 10-20 seconds, then make a 
> >>host
> >>fail its SLA check. Relayd will mark the host as being down when it
> >>reach next SLA check, but the sync_table() will not be called until 10
> >>sec. later (at the next SLA check).
> >>
> >>Solution:
> >>The logic is already in the code, but right now it only handle the
> >>statistics and set the host as being down.
> >>
> >>Call sync_table() when a host goes from UP to DOWN.
> >>
> >>
> >>Index: pfe.c
> >>===
> >>RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
> >>retrieving revision 1.79.2.1
> >>diff -u -p -u -p -r1.79.2.1 pfe.c
> >>--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
> >>+++ pfe.c   1 Oct 2015 10:48:59 -
> >>@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
> >>table->conf.flags |= F_CHANGED;
> >>host->flags |= F_DEL;
> >>host->flags &= ~(F_ADD);
> >>+   pfe_sync();
> >>}
> >>
> >>host->up = st.up;
> >>
> >>
> >>If you need more details or want to fix the scheduler issue, please
> >>contact me :)
> >>
> >>
> >>--
> >>bsv
> >>
> 

-- 



Re: relayd patch - delayed failover

2015-12-04 Thread Brian S. Vangsgaard


i believe i committed the correct one, i just replied to the wrong mail 
here

on the list. Here is what i put in:

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/relayd/pfe.c.diff?r1=1.82=1.83=date

/Benno



Correct, thank you.



Sebastian Benoit skrev den 2015-12-03 17:43:
>thanks, commited
>
>Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200:
>>Hi,
>>
>>Problem:
>>If a client have a state entry in the relayd anchor, and the target
>>server goes down, the client will be unable to "failover" for 10 sec +
>>(10 sec - elapsed time since last SLA check).
>>
>>There are two issues here, this patch only fix the problem about
>>delayed
>>(10 seconds) failover.
>>
>>When the host fails the SLA check, it will be marked as being down.
>>However it will not be removed from the achor before the next SLA
>>check.
>>
>>Reproduce:
>>Start relayd with -dvvv, let it run for 10-20 seconds, then make a
>>host
>>fail its SLA check. Relayd will mark the host as being down when it
>>reach next SLA check, but the sync_table() will not be called until 10
>>sec. later (at the next SLA check).
>>
>>Solution:
>>The logic is already in the code, but right now it only handle the
>>statistics and set the host as being down.
>>
>>Call sync_table() when a host goes from UP to DOWN.
>>
>>
>>Index: pfe.c
>>===
>>RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
>>retrieving revision 1.79.2.1
>>diff -u -p -u -p -r1.79.2.1 pfe.c
>>--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
>>+++ pfe.c   1 Oct 2015 10:48:59 -
>>@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
>>table->conf.flags |= F_CHANGED;
>>host->flags |= F_DEL;
>>host->flags &= ~(F_ADD);
>>+   pfe_sync();
>>}
>>
>>host->up = st.up;
>>
>>
>>If you need more details or want to fix the scheduler issue, please
>>contact me :)
>>
>>
>>--
>>bsv
>>





Re: relayd patch - delayed failover

2015-12-03 Thread Sebastian Benoit
thanks, commited

Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200:
> Hi,
> 
> Problem:
> If a client have a state entry in the relayd anchor, and the target 
> server goes down, the client will be unable to "failover" for 10 sec + 
> (10 sec - elapsed time since last SLA check).
> 
> There are two issues here, this patch only fix the problem about delayed 
> (10 seconds) failover.
> 
> When the host fails the SLA check, it will be marked as being down. 
> However it will not be removed from the achor before the next SLA check.
> 
> Reproduce:
> Start relayd with -dvvv, let it run for 10-20 seconds, then make a host 
> fail its SLA check. Relayd will mark the host as being down when it 
> reach next SLA check, but the sync_table() will not be called until 10 
> sec. later (at the next SLA check).
> 
> Solution:
> The logic is already in the code, but right now it only handle the 
> statistics and set the host as being down.
> 
> Call sync_table() when a host goes from UP to DOWN.
> 
> 
> Index: pfe.c
> ===
> RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
> retrieving revision 1.79.2.1
> diff -u -p -u -p -r1.79.2.1 pfe.c
> --- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
> +++ pfe.c   1 Oct 2015 10:48:59 -
> @@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
> table->conf.flags |= F_CHANGED;
> host->flags |= F_DEL;
> host->flags &= ~(F_ADD);
> +   pfe_sync();
> }
> 
> host->up = st.up;
> 
> 
> If you need more details or want to fix the scheduler issue, please 
> contact me :)
> 
> 
> --
> bsv
> 

-- 



relayd patch - delayed failover

2015-10-01 Thread Brian S. Vangsgaard

Hi,

Problem:
If a client have a state entry in the relayd anchor, and the target 
server goes down, the client will be unable to "failover" for 10 sec + 
(10 sec - elapsed time since last SLA check).


There are two issues here, this patch only fix the problem about delayed 
(10 seconds) failover.


When the host fails the SLA check, it will be marked as being down. 
However it will not be removed from the achor before the next SLA check.


Reproduce:
Start relayd with -dvvv, let it run for 10-20 seconds, then make a host 
fail its SLA check. Relayd will mark the host as being down when it 
reach next SLA check, but the sync_table() will not be called until 10 
sec. later (at the next SLA check).


Solution:
The logic is already in the code, but right now it only handle the 
statistics and set the host as being down.


Call sync_table() when a host goes from UP to DOWN.


Index: pfe.c
===
RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
retrieving revision 1.79.2.1
diff -u -p -u -p -r1.79.2.1 pfe.c
--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
+++ pfe.c   1 Oct 2015 10:48:59 -
@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
table->conf.flags |= F_CHANGED;
host->flags |= F_DEL;
host->flags &= ~(F_ADD);
+   pfe_sync();
}

host->up = st.up;


If you need more details or want to fix the scheduler issue, please 
contact me :)



--
bsv



Re: relayd patch - delayed failover

2015-10-01 Thread Brian S. Vangsgaard

Hi again,

Just found a bug in the patch, while testing I only use one host in each 
group, failover using another group.


This works, but only calling sync_table() with multiple hosts in a group 
(we want that :) ), causes the parent to exit when calling sync_table().


I'll rework the patch and do more testing before submitting again.



Solution:
The logic is already in the code, but right now it only handle the
statistics and set the host as being down.

Call sync_table() when a host goes from UP to DOWN.


Index: pfe.c
===
RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
retrieving revision 1.79.2.1
diff -u -p -u -p -r1.79.2.1 pfe.c
--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
+++ pfe.c   1 Oct 2015 10:48:59 -
@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
table->conf.flags |= F_CHANGED;
host->flags |= F_DEL;
host->flags &= ~(F_ADD);
+   pfe_sync();
}

host->up = st.up;





--
bsv