Hey, I'm wondering, does it related to this code:
+ /* some tasks may have woken other ones up */ + if (max_processed && thread_has_tasks()) + goto not_done_yet; + from http://git.haproxy.org/?p=haproxy-2.2.git;a=blobdiff;f=src/task.c;h=500223f185bf324c0adb34a42ec0244e638ce63e;hp=1a7f44d9169e0a01d42ba13d8d335102aa43577b;hb=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba;hpb=77015abe0bcfde67bff519b1d48393a513015f77 <http://git.haproxy.org/?p=haproxy-2.2.git;a=blobdiff;f=src/task.c;h=500223f185bf324c0adb34a42ec0244e638ce63e;hp=1a7f44d9169e0a01d42ba13d8d335102aa43577b;hb=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba;hpb=77015abe0bcfde67bff519b1d48393a513015f77> as far as I understand it should be safe to remove (with not_done_yet label). Can you try it? -- wbr, Kirill > On 3. Nov 2020, at 15:15, Maciej Zdeb <[email protected]> wrote: > > I modified h2s struct in 2.2 branch with HEAD set to > f96508aae6b49277dcf142caa35042678cf8e2ca "MEDIUM: mux-h2: merge recv_wait and > send_wait event notifications" like below (subs is in exact place of removed > wait_event): > > struct h2s { > [...] > struct tasklet *dummy0; > struct wait_event *dummy1; > struct wait_event *subs; /* recv wait_event the conn_stream > associated is waiting on (via h2_subscribe) */ > struct list list; /* To be used when adding in h2c->send_list or > h2c->fctl_lsit */ > struct tasklet *shut_tl; /* deferred shutdown tasklet, to retry to > send an RST after we failed to, > * in case there's no other subscription to > do it */ > } > > it crashed like before with subs = 0xffffffff: > > (gdb) p *(struct h2s*)(0x7fde7459e9b0) > $1 = {cs = 0x7fde5c02d260, sess = 0x5628283bc740 <pool_cache+64>, h2c = > 0x5628295cbb80, h1m = {state = H1_MSG_RPBEFORE, flags = 12, curr_len = 0, > body_len = 0, next = 0, err_pos = -1, err_state = 0}, by_id = {node = > {branches = {b = {0x0, 0x7fde3c2c6c60}}, node_p = 0x0, > leaf_p = 0x5628295cc018, bit = -5624, pfx = 29785}, key = 11}, id = 11, > flags = 28673, sws = -4060, errcode = H2_ERR_NO_ERROR, st = H2_SS_HREM, > status = 200, body_len = 0, rxbuf = {size = 0, area = 0x0, data = 0, head = > 0}, dummy0 = 0x0, dummy1 = 0x0, subs = 0xffffffff, list = { > n = 0x7fde7459ea68, p = 0x7fde7459ea68}, shut_tl = 0x5628297eeaf0} > > it crashes like above until commit: > http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba > > <http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba> > which alters tasks processing. > > > pon., 2 lis 2020 o 15:46 Maciej Zdeb <[email protected] <mailto:[email protected]>> > napisał(a): > I'm wondering, the corrupted address was always at "wait_event" in h2s > struct, after its removal in: > http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37 > > <http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37> > crashes went away. > > But with the above patch and after altering h2s struct into: > struct h2s { > [...] > struct tasklet *shut_tl; > struct wait_event *recv_wait; /* recv wait_event the conn_stream > associated is waiting on (via h2_subscribe) */ > struct wait_event *send_wait; /* send wait_event the conn_stream > associated is waiting on (via h2_subscribe) */ > struct list list; /* To be used when adding in h2c->send_list or > h2c->fctl_lsit */ > }; > > the crash returned. > > However after recv_wait and send_wait were merged in: > http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca > > <http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca> > crashes went away again. > > In my opinion shut_tl should be corrupted again, but it is not. Maybe the > last patch fixed it? > > pon., 2 lis 2020 o 15:37 Kirill A. Korinsky <[email protected] > <mailto:[email protected]>> napisał(a): > Maciej, > > Looks like memory corruption is still here but it corrupt just some another > place. > > Willy do you agree? > > -- > wbr, Kirill > >> On 2. Nov 2020, at 15:34, Maciej Zdeb <[email protected] >> <mailto:[email protected]>> wrote: >> >> So after Kirill suggestion to modify h2s struct in a way that tasklet >> "shut_tl" is before recv_wait I verified if in 2.2.4 the same crash will >> occur aaaand it did not! >> >> After the patch that merges recv_wait and send_wait: >> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca >> >> <http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca> >> and witch such h2s (tasklet shut_tl before wait_event subs) the crashes are >> gone: >> >> struct h2s { >> [...] >> struct buffer rxbuf; /* receive buffer, always valid (buf_empty or >> real buffer) */ >> struct tasklet *shut_tl; /* deferred shutdown tasklet, to retry to >> send an RST after we failed to, >> * in case there's no other subscription >> to do it */ >> struct wait_event *subs; /* recv wait_event the conn_stream >> associated is waiting on (via h2_subscribe) */ >> struct list list; /* To be used when adding in h2c->send_list or >> h2c->fctl_lsit */ >> }; >> >> >> >> pon., 2 lis 2020 o 12:42 Maciej Zdeb <[email protected] >> <mailto:[email protected]>> napisał(a): >> Great idea Kirill, >> >> With such modification: >> >> struct h2s { >> [...] >> struct tasklet *shut_tl; >> struct wait_event *recv_wait; /* recv wait_event the conn_stream >> associated is waiting on (via h2_subscribe) */ >> struct wait_event *send_wait; /* send wait_event the conn_stream >> associated is waiting on (via h2_subscribe) */ >> struct list list; /* To be used when adding in h2c->send_list or >> h2c->fctl_lsit */ >> }; >> >> it crashed just like before. >> >> pon., 2 lis 2020 o 11:12 Kirill A. Korinsky <[email protected] >> <mailto:[email protected]>> napisał(a): >> Hi, >> >> Thanks for update. >> >> After read Wully's recommendation and provided commit that fixed an issue >> I'm curious can you "edit" a bit this commit and move `shut_tl` before >> `recv_wait` instead of removed `wait_event`? >> >> It is a quiet dummy way to confirm that memory corruption had gone, and not >> just moved to somewhere else. >> >> -- >> wbr, Kirill >> >>> On 2. Nov 2020, at 10:58, Maciej Zdeb <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi, >>> >>> Update for people on the list that might be interested in the issue, >>> because part of discussion was private. >>> >>> I wanted to check Willy suggestion and modified h2s struct (added dummy >>> fields): >>> >>> struct h2s { >>> [...] >>> uint16_t status; /* HTTP response status */ >>> unsigned long long body_len; /* remaining body length according to >>> content-length if H2_SF_DATA_CLEN */ >>> struct buffer rxbuf; /* receive buffer, always valid (buf_empty or >>> real buffer) */ >>> int dummy0; >>> struct wait_event wait_event; /* Wait list, when we're attempting >>> to send a RST but we can't send */ >>> int dummy1; >>> struct wait_event *recv_wait; /* recv wait_event the conn_stream >>> associated is waiting on (via h2_subscribe) */ >>> int dummy2; >>> struct wait_event *send_wait; /* send wait_event the conn_stream >>> associated is waiting on (via h2_subscribe) */ >>> int dummy3; >>> struct list list; /* To be used when adding in h2c->send_list or >>> h2c->fctl_lsit */ >>> struct list sending_list; /* To be used when adding in >>> h2c->sending_list */ >>> }; >>> >>> With such modified h2s struct, the crash did not occur. >>> >>> I've checked HAProxy 2.1, it crashes like 2.0. >>> >>> I've also checked 2.2, bisection showed that this commit: >>> http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37 >>> >>> <http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37> >>> fixed the crashes we experienced. I'm not sure how/if it fixed the memory >>> corruption, it is possible that memory is still corrupted but not causing >>> the crash. >>> >>> >>> >>> pt., 25 wrz 2020 o 16:25 Kirill A. Korinsky <[email protected] >>> <mailto:[email protected]>> napisał(a): >>> Very interesting. >>> >>> Anyway, I can see that this pice of code was refactored some time ago: >>> https://github.com/haproxy/haproxy/commit/f96508aae6b49277dcf142caa35042678cf8e2ca >>> >>> <https://github.com/haproxy/haproxy/commit/f96508aae6b49277dcf142caa35042678cf8e2ca> >>> >>> Maybe it is worth to try 2.2 or 2.3 branch? >>> >>> Yes, it is a blind shot and just a guess. >>> >>> -- >>> wbr, Kirill >>> >>>> On 25. Sep 2020, at 16:01, Maciej Zdeb <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Yes at the same place with same value: >>>> >>>> (gdb) bt full >>>> #0 0x0000559ce98b964b in h2s_notify_recv (h2s=0x559cef94e7e0) at >>>> src/mux_h2.c:783 >>>> sw = 0xffffffff >>>> >>>> >>>> >>>> pt., 25 wrz 2020 o 15:42 Kirill A. Korinsky <[email protected] >>>> <mailto:[email protected]>> napisał(a): >>>> > On 25. Sep 2020, at 15:26, Maciej Zdeb <[email protected] >>>> > <mailto:[email protected]>> wrote: >>>> > >>>> > I was mailing outside the list with Willy and Christopher but it's worth >>>> > sharing that the problem occurs even with nbthread = 1. I've managed to >>>> > confirm it today. >>>> >>>> >>>> I'm curious is it crashed at the same place with the same value? >>>> >>>> -- >>>> wbr, Kirill >>>> >>>> >>> >> >
signature.asc
Description: Message signed with OpenPGP

