Hi Willy! So, we deployed the new shiny 3.1.0 version of HAProxy add added the logging you suggested.
We use this log format: log-format "Client: %ci:%cp | Time: [%t] | Frontend: %f | Backend: %b | Server: %s | Frontend Error: %[fc_err_name] | Backend Error: %[bc_err_name] | Frontend Debug: %[fs.debug_str] | Backend Debug: %[bs.debug_str] | Request: \"%r\"" And this is the result of some of the requests where the stream was closed by an RST_STREAM from HAProxy: Client: 193.0.241.14:10505 | Time: [28/Nov/2024:11:30:49.331] | Frontend: ft_web | Backend: bk_appsrv | Server: server1 | Frontend Error: - | Backend Error: - | Frontend Debug: h2s.id=15 .st=CLO .flg=0x4005 .rxwin=0 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x7f776011cd60(.flg=0x00074086 .app=0x7f7760086670) .sd=0x7f77600daad0(.flg=0x5041d601) .subs=(nil) h2c.st0=FRH .err=0 .maxid=27 .lastid=-1 .flg=0x410200 .nbst=3 .nbsc=6 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=6 .orph_cnt=0 .sub=1 .dsi=27 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x7f7760416280 .exp=<NEVER> conn.flg=0xa0000300 | Backend Debug: h2s.id=5 .st=CLO .flg=0x300a .rxwin=1048576 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x7f7760053180(.flg=0x0003d413 .app=0x7f7760086670) .sd=0x7f77600a2420(.flg=0x50420a01) .subs=(nil) h2c.st0=FRH .err=0 .maxid=7 .lastid=-1 .flg=0x18e00 .nbst=1 .nbsc=2 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=2 .orph_cnt=0 .sub=1 .dsi=3 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[13@0x7f77603634a0+0/16384],t=[13@0x7f77603634a0+0/16384] .task=0x7f77601cf220 .exp=<NEVER> conn.flg=0x000300 | Request: "GET https://www.mydomain.com/lib/select2/js/select2.min.js HTTP/2.0" Client: 176.87.9.172:11004 | Time: [28/Nov/2024:11:40:51.466] | Frontend: ft_web | Backend: bk_appsrv | Server: server1 | Frontend Error: - | Backend Error: - | Frontend Debug: h2s.id=7 .st=CLO .flg=0x4005 .rxwin=0 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x5611a99ff9d0(.flg=0x00074086 .app=0x5611a9755b30) .sd=0x5611a9a22e40(.flg=0x5041d601) .subs=(nil) h2c.st0=FRH .err=0 .maxid=11 .lastid=-1 .flg=0x0600 .nbst=1 .nbsc=2 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=2 .orph_cnt=0 .sub=1 .dsi=11 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x5611a9a01b60 .exp=<NEVER> conn.flg=0xa0000300 | Backend Debug: h2s.id=83 .st=CLO .flg=0x300a .rxwin=1048576 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x5611a97ba600(.flg=0x0003d413 .app=0x5611a9755b30) .sd=0x5611a9870280(.flg=0x50420a01) .subs=(nil) h2c.st0=FRH .err=0 .maxid=85 .lastid=-1 .flg=0x18e00 .nbst=0 .nbsc=1 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=81 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[13@0x5611a97e3f30+0/16384],t=[13@0x5611a97e3f30+0/16384] .task=0x5611a9a40cc0 .exp=<NEVER> conn.flg=0x000300 | Request: "GET https://www.mydomain.com/fetch HTTP/2.0" Client: 192.54.145.137:50920 | Time: [28/Nov/2024:11:45:36.627] | Frontend: ft_web | Backend: bk_appsrv | Server: server1 | Frontend Error: - | Backend Error: - | Frontend Debug: h2s.id=27 .st=CLO .flg=0x4005 .rxwin=0 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x7f777419c520(.flg=0x00074086 .app=0x7f777404dc70) .sd=0x7f77740ae7b0(.flg=0x5041d601) .subs=(nil) h2c.st0=FRH .err=0 .maxid=27 .lastid=-1 .flg=0x0200 .nbst=0 .nbsc=1 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=27 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x5611a9824980 .exp=<NEVER> conn.flg=0xa0000300 | Backend Debug: h2s.id=35 .st=CLO .flg=0x300a .rxwin=1048576 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x7f777418b320(.flg=0x0003d413 .app=0x7f777404dc70) .sd=0x7f77740441b0(.flg=0x50420a01) .subs=(nil) h2c.st0=FRH .err=0 .maxid=35 .lastid=-1 .flg=0x18e00 .nbst=0 .nbsc=1 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=33 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[13@0x7f77742024b0+0/16384],t=[13@0x7f77742024b0+0/16384] .task=0x7f77740c2b20 .exp=<NEVER> conn.flg=0x000300 | Request: "GET https://www.mydomain.com/lib/polyfills.js HTTP/2.0" Does this help at all? Something else we should log or try? /Andreas ________________________________ From: Willy Tarreau <w...@1wt.eu> Sent: Wednesday, November 13, 2024 14:31 To: Andreas Bergander <andreas.bergan...@benify.com> Cc: haproxy@formilux.org <haproxy@formilux.org> Subject: Re: RST_STREAM immediately after GET request in h2 Hi Andreas, On Wed, Nov 13, 2024 at 10:25:15AM +0000, Andreas Bergander wrote: > Hi Willy! > > Thanks for the great pointers and explanations! We do not have the > abortonclose option set, the only options we use are for access logging > (httpslog). OK! > These are our timeout settings: > > timeout connect 5s > timeout client 20s > timeout client-fin 20s > timeout server 1200s > timeout tunnel 1h > timeout check 5s > timeout http-keep-alive 10s > timeout http-request 200s > > > Currently we can only observer the issue in a production environment with > plenty of traffic, so logging might be problematic to turn on there. I'll see > if we can trigger the issue in a test system with logging. Then one thing that might be doable *if you're willing to give 3.1-dev a try* would be to add just this to your existing logs: fc_err=%[fc_err_name] bc_err=%[bc_err_name] fs_dbg=[%[fs.debug_str]] bs_dbg=[%[bs.debug_str]] These will complete your existing traffic logs with extra info indicating how the connection ended on each side. The fc/bc_err_name will be a single word and is short. fs/bs_dbg will be larger, ~50 chars. If that's too much, maybe just keeping the back first could reveal what's happening, but we don't know what we're looking for so it's hard to say (at least if you have to keep only one side, the backend is more important). If that's still too much, you can append ",when(error)" to strip these outputs when there's no error: fc_err=%[fc_err_name] bc_err=%[bc_err_name] fs_dbg=[%[fs.debug_str,when(error)]] bs_dbg=[%[bs.debug_str,when(error)]] >From that point, the next time you spot issues in your logs, these logs should help a lot. If you can't give 3.1-dev a try (which I perfectly understand but I preferred to suggest), this will need to wait for us to manage to backport these to 3.0. Cheers, Willy