I'm wondering, the corrupted address was always at "wait_event" in h2s
struct, after its removal in:
http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37
crashes went away.
But with the above patch and after altering h2s struct into:
struct h2s {
[...]
struct tasklet *shut_tl;
struct wait_event *recv_wait; /* recv wait_event the conn_stream
associated is waiting on (via h2_subscribe) */
struct wait_event *send_wait; /* send wait_event the conn_stream
associated is waiting on (via h2_subscribe) */
struct list list; /* To be used when adding in h2c->send_list or
h2c->fctl_lsit */
};
the crash returned.
However after recv_wait and send_wait were merged in:
http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca
crashes went away again.
In my opinion shut_tl should be corrupted again, but it is not. Maybe the
last patch fixed it?
pon., 2 lis 2020 o 15:37 Kirill A. Korinsky <[email protected]> napisał(a):
> Maciej,
>
> Looks like memory corruption is still here but it corrupt just some
> another place.
>
> Willy do you agree?
>
> --
> wbr, Kirill
>
> On 2. Nov 2020, at 15:34, Maciej Zdeb <[email protected]> wrote:
>
> So after Kirill suggestion to modify h2s struct in a way that tasklet
> "shut_tl" is before recv_wait I verified if in 2.2.4 the same crash will
> occur aaaand it did not!
>
> After the patch that merges recv_wait and send_wait:
> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca
> and witch such h2s (tasklet shut_tl before wait_event subs) the crashes
> are gone:
>
> struct h2s {
> [...]
> struct buffer rxbuf; /* receive buffer, always valid (buf_empty or
> real buffer) */
> struct tasklet *shut_tl; /* deferred shutdown tasklet, to retry
> to send an RST after we failed to,
> * in case there's no other subscription
> to do it */
> struct wait_event *subs; /* recv wait_event the conn_stream
> associated is waiting on (via h2_subscribe) */
> struct list list; /* To be used when adding in h2c->send_list or
> h2c->fctl_lsit */
> };
>
>
>
> pon., 2 lis 2020 o 12:42 Maciej Zdeb <[email protected]> napisał(a):
>
>> Great idea Kirill,
>>
>> With such modification:
>>
>> struct h2s {
>> [...]
>> struct tasklet *shut_tl;
>> struct wait_event *recv_wait; /* recv wait_event the conn_stream
>> associated is waiting on (via h2_subscribe) */
>> struct wait_event *send_wait; /* send wait_event the conn_stream
>> associated is waiting on (via h2_subscribe) */
>> struct list list; /* To be used when adding in h2c->send_list or
>> h2c->fctl_lsit */
>> };
>>
>> it crashed just like before.
>>
>> pon., 2 lis 2020 o 11:12 Kirill A. Korinsky <[email protected]>
>> napisał(a):
>>
>>> Hi,
>>>
>>> Thanks for update.
>>>
>>> After read Wully's recommendation and provided commit that fixed an
>>> issue I'm curious can you "edit" a bit this commit and move `shut_tl`
>>> before `recv_wait` instead of removed `wait_event`?
>>>
>>> It is a quiet dummy way to confirm that memory corruption had gone, and
>>> not just moved to somewhere else.
>>>
>>> --
>>> wbr, Kirill
>>>
>>> On 2. Nov 2020, at 10:58, Maciej Zdeb <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> Update for people on the list that might be interested in the issue,
>>> because part of discussion was private.
>>>
>>> I wanted to check Willy suggestion and modified h2s struct (added dummy
>>> fields):
>>>
>>> struct h2s {
>>> [...]
>>> uint16_t status; /* HTTP response status */
>>> unsigned long long body_len; /* remaining body length according
>>> to content-length if H2_SF_DATA_CLEN */
>>> struct buffer rxbuf; /* receive buffer, always valid (buf_empty
>>> or real buffer) */
>>> int dummy0;
>>> struct wait_event wait_event; /* Wait list, when we're
>>> attempting to send a RST but we can't send */
>>> int dummy1;
>>> struct wait_event *recv_wait; /* recv wait_event the conn_stream
>>> associated is waiting on (via h2_subscribe) */
>>> int dummy2;
>>> struct wait_event *send_wait; /* send wait_event the conn_stream
>>> associated is waiting on (via h2_subscribe) */
>>> int dummy3;
>>> struct list list; /* To be used when adding in h2c->send_list or
>>> h2c->fctl_lsit */
>>> struct list sending_list; /* To be used when adding in
>>> h2c->sending_list */
>>> };
>>>
>>> With such modified h2s struct, the crash did not occur.
>>>
>>> I've checked HAProxy 2.1, it crashes like 2.0.
>>>
>>> I've also checked 2.2, bisection showed that this commit:
>>> http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37
>>> fixed the crashes we experienced. I'm not sure how/if it fixed the memory
>>> corruption, it is possible that memory is still corrupted but not causing
>>> the crash.
>>>
>>>
>>>
>>> pt., 25 wrz 2020 o 16:25 Kirill A. Korinsky <[email protected]>
>>> napisał(a):
>>>
>>>> Very interesting.
>>>>
>>>> Anyway, I can see that this pice of code was refactored some time ago:
>>>> https://github.com/haproxy/haproxy/commit/f96508aae6b49277dcf142caa35042678cf8e2ca
>>>>
>>>> Maybe it is worth to try 2.2 or 2.3 branch?
>>>>
>>>> Yes, it is a blind shot and just a guess.
>>>>
>>>> --
>>>> wbr, Kirill
>>>>
>>>> On 25. Sep 2020, at 16:01, Maciej Zdeb <[email protected]> wrote:
>>>>
>>>> Yes at the same place with same value:
>>>>
>>>> (gdb) bt full
>>>> #0 0x0000559ce98b964b in h2s_notify_recv (h2s=0x559cef94e7e0) at
>>>> src/mux_h2.c:783
>>>> sw = 0xffffffff
>>>>
>>>>
>>>>
>>>> pt., 25 wrz 2020 o 15:42 Kirill A. Korinsky <[email protected]>
>>>> napisał(a):
>>>>
>>>>> > On 25. Sep 2020, at 15:26, Maciej Zdeb <[email protected]> wrote:
>>>>> >
>>>>> > I was mailing outside the list with Willy and Christopher but it's
>>>>> worth sharing that the problem occurs even with nbthread = 1. I've managed
>>>>> to confirm it today.
>>>>>
>>>>>
>>>>> I'm curious is it crashed at the same place with the same value?
>>>>>
>>>>> --
>>>>> wbr, Kirill
>>>>>
>>>>>
>>>>>
>>>>
>>>
>