On 2016-03-29 15:13, Christian Ruppert wrote:
On 2016-03-29 10:58, Christian Ruppert wrote:
Hi Willy,

On 2016-03-25 18:17, Willy Tarreau wrote:
On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote:
I think it's even different (but could be wrong) since Christian spoke about counters suddenly doubling. The issue you faced Sylvain which I
still have no idea how to fix unfortunately is that the peers applet
is not always woken up when a connection establishes on the other side
and it may simply miss an event, resulting in everything remaining
stable and appear frozen until the connection closes. Here it seems
data are exchanged but incorrect. This one could be easier to reproduce
however, we'll check.

OK I found it. Indeed it was easy to reproduce. The frequency counters are sent as "now - freq.date", which is a positive age compared to the current date. But on receipt, this age was *added* to the current date instead of subtracted. So since the date was always in the future, they
were always expired if the activity changed side in less than the
counter's measuring period (eg: 10s).

I'm commiting this simple fix that you can apply to your tree for now.

Cheers,
Willy

diff --git a/src/peers.c b/src/peers.c
index c29ea73..9918dac 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -1153,7 +1153,7 @@ switchstate:
                                                                        case 
STD_T_FRQP: {
                                                                                
struct freq_ctr_period data;

- data.curr_tick = tick_add(now_ms, intdecode(&msg_cur, msg_end)); + data.curr_tick = tick_add(now_ms, -intdecode(&msg_cur, msg_end));
                                                                                
if (!msg_cur) {
                                                                                
        /* malformed message */
                                                                                   
     appctx->st0 = PEER_SESS_ST_ERRPROTO;

Thanks a lot for the fast investigation! The proposed patch seems to
do the trick :)

Hrm, or not. At least not completely.
There's still something wrong it seems:
20160329 15:07:03: 0x3bca858: key=xx.xx.xx.xx use=0 exp=28799601
gpc0=0 conn_cnt=682 conn_rate(10000)=1 conn_cur=3 sess_cnt=1
sess_rate(10000)=-1032058827 http_req_cnt=0 http_req_rate(10000)=2272
http_err_cnt=3 http_err_rate(10000)=1143800 bytes_in_cnt=0
bytes_out_cnt=247977
Note the sess_rate is a negative int. Some http_err_rate seems to be
affected as well. Even the http_req_rate seems to be still wrong, in
some cases.
20160329 15:11:38: 0x3e67318: key=xx.xx.xx.xx use=0 exp=28605259
gpc0=0 conn_cnt=86 conn_rate(10000)=0 conn_cur=7 sess_cnt=0
sess_rate(10000)=0 http_req_cnt=0 http_req_rate(10000)=349038424
http_err_cnt=6 http_err_rate(10000)=0 bytes_in_cnt=0
bytes_out_cnt=3261818950
We're using httpclose so in this case it *actually* should match the
conn_cnt so 86.

I haven't had enough time yet but it looks like I had one case where the now_ms? was used as value and if that would explain the integer overflow within http_sess_rate if that is added furthermore.

--
Regards,
Christian Ruppert

Reply via email to