Hi Kay,

On 07/20/2016 11:27 AM, Willy Tarreau wrote:
Hi Kay,

On Wed, Jul 20, 2016 at 11:17:57AM +0200, Kay Fuchs wrote:
Hi!

2016-07-19 11:26 GMT+02:00 Kay Fuchs <[email protected]>:
i'm using a stick-table with HAProxy 1.6.7 on an active/standby
configuration like this:

 stick-table type ipv6 size 500k expire 60s peers hacluster store
gpc0,conn_cur,http_req_rate(10s),http_err_rate(10s)
 http-request track-sc0

On the standby peer the table obviously shows wrong http_err_rates:

 0xe6ce10: key=xxx use=0 exp=59598 gpc0=0 conn_cur=1
http_req_rate(10000)=1 http_err_rate(10000)=346
 0xe3ed80: key=xxx use=0 exp=58440 gpc0=0 conn_cur=1
http_req_rate(10000)=27 http_err_rate(10000)=38841809

The active peer seems to behave as expected and shows very low error rates.

I'm no programmer, but i think it has to do with "frqp->curr_tick" in
"peers.c" which seems to have the value "0" if the very first error
appears. This leads to sending "now_ms" to the peer. If i check
"frqp->curr_tick" before the encoding like

 if (frqp->curr_tick == 0)
   frqp->curr_tick = now_ms;

the error rates seems reasonable on the standby peer.

I think either the function "intencode" or "intdecode" in "peers.c"
seems not to return the expected values. I've made a simple loop to
compare the input for "intencode" with the outputs of "intdecode" for
the encoded message. The first wrong encoded/decoded range of integers
are 4336-4351.

I confirm that 4336-4351 is the first range impacted by the bug I found yesterday in intdecode().

Here are the encoded values for this range:

4336  --> enc. value: 0xf08001
4337  --> enc. value: 0xf18001
4338  --> enc. value: 0xf28001
4339  --> enc. value: 0xf38001
4340  --> enc. value: 0xf48001
4341  --> enc. value: 0xf58001
4342  --> enc. value: 0xf68001
4343  --> enc. value: 0xf78001
4344  --> enc. value: 0xf88001
4345  --> enc. value: 0xf98001
4346  --> enc. value: 0xfa8001
4347  --> enc. value: 0xfb8001
4348  --> enc. value: 0xfc8001
4349  --> enc. value: 0xfd8001
4350  --> enc. value: 0xfe8001
4351  --> enc. value: 0xff8001

indecode() stops decoding as soon as it finds a byte less than or equal to 0x80 (128), from high to low byte.

Each impacted range decoded value is 0x8f0-0x8ff (2288-2303).


That might explain
http://thread.gmane.org/gmane.comp.web.haproxy/27168 in combination
with the sending of large integer "now_ms" reported above.

Very interesting. Yesterday Fred (in CC) found a bug there and we concluded
that it could explain such random issues that we were not able to reproduce
(possibly because we didn't face the exact faulty value). I think we'll have
a patch shortly for this.

Thanks for your feedback,
Willy



Reply via email to