Hey,

I'm wondering, does it related to this code:

+       /* some tasks may have woken other ones up */
+       if (max_processed && thread_has_tasks())
+               goto not_done_yet;
+

from 
http://git.haproxy.org/?p=haproxy-2.2.git;a=blobdiff;f=src/task.c;h=500223f185bf324c0adb34a42ec0244e638ce63e;hp=1a7f44d9169e0a01d42ba13d8d335102aa43577b;hb=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba;hpb=77015abe0bcfde67bff519b1d48393a513015f77
 
<http://git.haproxy.org/?p=haproxy-2.2.git;a=blobdiff;f=src/task.c;h=500223f185bf324c0adb34a42ec0244e638ce63e;hp=1a7f44d9169e0a01d42ba13d8d335102aa43577b;hb=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba;hpb=77015abe0bcfde67bff519b1d48393a513015f77>

as far as I understand it should be safe to remove (with not_done_yet label).

Can you try it?

--
wbr, Kirill

> On 3. Nov 2020, at 15:15, Maciej Zdeb <[email protected]> wrote:
> 
> I modified h2s struct in 2.2 branch with HEAD set to 
> f96508aae6b49277dcf142caa35042678cf8e2ca "MEDIUM: mux-h2: merge recv_wait and 
> send_wait event notifications" like below (subs is in exact place of removed 
> wait_event):
> 
> struct h2s {
>         [...]
>         struct tasklet *dummy0;
>         struct wait_event *dummy1;
>         struct wait_event *subs;      /* recv wait_event the conn_stream 
> associated is waiting on (via h2_subscribe) */
>         struct list list; /* To be used when adding in h2c->send_list or 
> h2c->fctl_lsit */
>         struct tasklet *shut_tl;  /* deferred shutdown tasklet, to retry to 
> send an RST after we failed to,
>                                    * in case there's no other subscription to 
> do it */
> }
> 
> it crashed like before with subs = 0xffffffff:
> 
> (gdb) p *(struct h2s*)(0x7fde7459e9b0)
> $1 = {cs = 0x7fde5c02d260, sess = 0x5628283bc740 <pool_cache+64>, h2c = 
> 0x5628295cbb80, h1m = {state = H1_MSG_RPBEFORE, flags = 12, curr_len = 0,
>     body_len = 0, next = 0, err_pos = -1, err_state = 0}, by_id = {node = 
> {branches = {b = {0x0, 0x7fde3c2c6c60}}, node_p = 0x0,
>       leaf_p = 0x5628295cc018, bit = -5624, pfx = 29785}, key = 11}, id = 11, 
> flags = 28673, sws = -4060, errcode = H2_ERR_NO_ERROR, st = H2_SS_HREM,
>   status = 200, body_len = 0, rxbuf = {size = 0, area = 0x0, data = 0, head = 
> 0}, dummy0 = 0x0, dummy1 = 0x0, subs = 0xffffffff, list = {
>     n = 0x7fde7459ea68, p = 0x7fde7459ea68}, shut_tl = 0x5628297eeaf0}
> 
> it crashes like above until commit: 
> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba
>  
> <http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba>
>  which alters tasks processing.
> 
> 
> pon., 2 lis 2020 o 15:46 Maciej Zdeb <[email protected] <mailto:[email protected]>> 
> napisał(a):
> I'm wondering, the corrupted address was always at "wait_event" in h2s 
> struct, after its removal in: 
> http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37
>  
> <http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37>
>  crashes went away.
> 
> But with the above patch and after altering h2s struct into:
> struct h2s {
>         [...]
>         struct tasklet *shut_tl;
>         struct wait_event *recv_wait; /* recv wait_event the conn_stream 
> associated is waiting on (via h2_subscribe) */
>         struct wait_event *send_wait; /* send wait_event the conn_stream 
> associated is waiting on (via h2_subscribe) */
>         struct list list; /* To be used when adding in h2c->send_list or 
> h2c->fctl_lsit */
> };
> 
> the crash returned.
> 
> However after recv_wait and send_wait were merged in: 
> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca
>  
> <http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca>
>  crashes went away again.
> 
> In my opinion shut_tl should be corrupted again, but it is not. Maybe the 
> last patch fixed it?
> 
> pon., 2 lis 2020 o 15:37 Kirill A. Korinsky <[email protected] 
> <mailto:[email protected]>> napisał(a):
> Maciej,
> 
> Looks like memory corruption is still here but it corrupt just some another 
> place.
> 
> Willy do you agree?
> 
> --
> wbr, Kirill
> 
>> On 2. Nov 2020, at 15:34, Maciej Zdeb <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> So after Kirill suggestion to modify h2s struct in a way that tasklet 
>> "shut_tl" is before recv_wait I verified if in 2.2.4 the same crash will 
>> occur aaaand it did not!
>> 
>> After the patch that merges recv_wait and send_wait: 
>> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca
>>  
>> <http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca>
>> and witch such h2s (tasklet shut_tl before wait_event subs) the crashes are 
>> gone:
>> 
>> struct h2s {
>>         [...]
>>         struct buffer rxbuf; /* receive buffer, always valid (buf_empty or 
>> real buffer) */
>>         struct tasklet *shut_tl;  /* deferred shutdown tasklet, to retry to 
>> send an RST after we failed to,
>>                                    * in case there's no other subscription 
>> to do it */
>>         struct wait_event *subs;      /* recv wait_event the conn_stream 
>> associated is waiting on (via h2_subscribe) */
>>         struct list list; /* To be used when adding in h2c->send_list or 
>> h2c->fctl_lsit */
>> };
>> 
>> 
>> 
>> pon., 2 lis 2020 o 12:42 Maciej Zdeb <[email protected] 
>> <mailto:[email protected]>> napisał(a):
>> Great idea Kirill,
>> 
>> With such modification:
>> 
>> struct h2s {
>>         [...]
>>         struct tasklet *shut_tl;
>>         struct wait_event *recv_wait; /* recv wait_event the conn_stream 
>> associated is waiting on (via h2_subscribe) */
>>         struct wait_event *send_wait; /* send wait_event the conn_stream 
>> associated is waiting on (via h2_subscribe) */
>>         struct list list; /* To be used when adding in h2c->send_list or 
>> h2c->fctl_lsit */
>> };
>> 
>> it crashed just like before.
>> 
>> pon., 2 lis 2020 o 11:12 Kirill A. Korinsky <[email protected] 
>> <mailto:[email protected]>> napisał(a):
>> Hi,
>> 
>> Thanks for update.
>> 
>> After read Wully's recommendation and provided commit that fixed an issue 
>> I'm curious can you "edit" a bit this commit and move `shut_tl` before 
>> `recv_wait` instead of removed `wait_event`?
>> 
>> It is a quiet dummy way to confirm that memory corruption had gone, and not 
>> just moved to somewhere else.
>> 
>> --
>> wbr, Kirill
>> 
>>> On 2. Nov 2020, at 10:58, Maciej Zdeb <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi,
>>> 
>>> Update for people on the list that might be interested in the issue, 
>>> because part of discussion was private.
>>> 
>>> I wanted to check Willy suggestion and modified h2s struct (added dummy 
>>> fields):
>>> 
>>> struct h2s {
>>>         [...]
>>>         uint16_t status;     /* HTTP response status */
>>>         unsigned long long body_len; /* remaining body length according to 
>>> content-length if H2_SF_DATA_CLEN */
>>>         struct buffer rxbuf; /* receive buffer, always valid (buf_empty or 
>>> real buffer) */
>>>         int dummy0;
>>>         struct wait_event wait_event; /* Wait list, when we're attempting 
>>> to send a RST but we can't send */
>>>         int dummy1;
>>>         struct wait_event *recv_wait; /* recv wait_event the conn_stream 
>>> associated is waiting on (via h2_subscribe) */
>>>         int dummy2;
>>>         struct wait_event *send_wait; /* send wait_event the conn_stream 
>>> associated is waiting on (via h2_subscribe) */
>>>         int dummy3;
>>>         struct list list; /* To be used when adding in h2c->send_list or 
>>> h2c->fctl_lsit */
>>>         struct list sending_list; /* To be used when adding in 
>>> h2c->sending_list */
>>> };
>>> 
>>> With such modified h2s struct, the crash did not occur.
>>> 
>>> I've checked HAProxy 2.1, it crashes like 2.0.
>>> 
>>> I've also checked 2.2, bisection showed that this commit: 
>>> http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37
>>>  
>>> <http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37>
>>>  fixed the crashes we experienced. I'm not sure how/if it fixed the memory 
>>> corruption, it is possible that memory is still corrupted but not causing 
>>> the crash.
>>> 
>>> 
>>> 
>>> pt., 25 wrz 2020 o 16:25 Kirill A. Korinsky <[email protected] 
>>> <mailto:[email protected]>> napisał(a):
>>> Very interesting.
>>> 
>>> Anyway, I can see that this pice of code was refactored some time ago: 
>>> https://github.com/haproxy/haproxy/commit/f96508aae6b49277dcf142caa35042678cf8e2ca
>>>  
>>> <https://github.com/haproxy/haproxy/commit/f96508aae6b49277dcf142caa35042678cf8e2ca>
>>> 
>>> Maybe it is worth to try 2.2 or 2.3 branch?
>>> 
>>> Yes, it is a blind shot and just a guess.
>>> 
>>> --
>>> wbr, Kirill
>>> 
>>>> On 25. Sep 2020, at 16:01, Maciej Zdeb <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Yes at the same place with same value:
>>>> 
>>>> (gdb) bt full
>>>> #0  0x0000559ce98b964b in h2s_notify_recv (h2s=0x559cef94e7e0) at 
>>>> src/mux_h2.c:783
>>>>         sw = 0xffffffff
>>>> 
>>>> 
>>>> 
>>>> pt., 25 wrz 2020 o 15:42 Kirill A. Korinsky <[email protected] 
>>>> <mailto:[email protected]>> napisał(a):
>>>> > On 25. Sep 2020, at 15:26, Maciej Zdeb <[email protected] 
>>>> > <mailto:[email protected]>> wrote:
>>>> >
>>>> > I was mailing outside the list with Willy and Christopher but it's worth 
>>>> > sharing that the problem occurs even with nbthread = 1. I've managed to 
>>>> > confirm it today.
>>>> 
>>>> 
>>>> I'm curious is it crashed at the same place with the same value?
>>>> 
>>>> --
>>>> wbr, Kirill
>>>> 
>>>> 
>>> 
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to