Re: kernel diagnostic assertion "st->timeout == PFTM_UNLINKED" failed: file "

Johan Huldtgren Wed, 29 Nov 2023 20:25:29 -0800

hello,

On 2023-11-04 15:26, Alexandr Nedvedicky wrote:
> Hello Johan,
> 
> On Sat, Nov 04, 2023 at 10:01:06AM -0400, Johan Huldtgren wrote:
> > hello,
> > 
> > On 2023-11-03 19:10, Alexandr Nedvedicky wrote:
> > > Hello Johan,
> > > 
> > > 
> > > On Fri, Nov 03, 2023 at 12:27:53PM -0400, Johan Huldtgren wrote:
> > > </snip>
> > > > 
> > > > so this box just has the default (from when it was installed) ruleset. 
> > > > 
> > > > $ doas cat /etc/pf.conf
> > > > #       $OpenBSD: pf.conf,v 1.55 2017/12/03 20:40:04 sthen Exp $
> > > > #
> > > > # See pf.conf(5) and /etc/examples/pf.conf
> > > > 
> > > > set skip on lo
> > > > set state-defaults pflow
> > > > 
> > > > block return    # block stateless traffic
> > > > pass            # establish keep-state
> > > > 
> > > > # By default, do not permit remote connections to X11
> > > > block return in on ! lo0 proto tcp to port 6000:6010
> > > > 
> > > > # Port build user does not need network
> > > > block return out log proto {tcp udp} user _pbuild
> > > > 
> > > 
> > > So that's surprising then... Looks like you are very lucky
> > > to hit the ASSERT. I'm surprised we have not seen it earlier.
> > > 
> > > Diff below makes sure pf_test() function does not overwrite
> > > timeout member in pf_state structure when timeout is set
> > > to PFTM_UNLINKED already. We also modify/update timeout member
> > > under protection of state mutex (pf_state::mtx).
> > > 
> > > 
> > > Can you test the diff below? It applies to current as well to 7.4
> > 
> > I've rebuilt with your diff, as the panic was seemingly random I'm not
> > sure how I can test, but I'll let this system run with your patch and 
> > report any issues should I see them. If you have any specific things
> > you'd like me to try don't hesitate to let me know. dmesg below for
> > complteness sake.
> > 
> > thanks again,
> > 
> 
>     I'm afraid there is nothing more to do than keep an eye on your
>     system. I think what really increased a chance here is the number
>     of CPUs your box has.
> 
>     It is OK if you can come back with report early in December to let
>     us know if it helps or if there are more similar issues (which I'm
>     sure there are still some left).


so my machine paniced today, but the panic this time is completely different.
I don't know if it's related to this issue, the patch, or a completely new
issue, but I figured I'd start reporting it here. Unfortuntately when I tried
to swap CPU to collect traces from the other ones the machine froze and I was
forced to power cycle it. So I have the panic and initial trace but that's it. 

panic: ip_output no HDR
Stopped at      db_enter+0x14:  popq    %rbp
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
  74003  25022      0        0x10          0    2  afpd
 355827  29745    107   0x1100002  0x4000000    3  vmd
 451006  29745    107   0x1100002  0x4000000    4  vmd
 131508  78367    107   0x1100002  0x4000000    5  vmd
 112644  78367    107   0x1100002  0x4000000    1  vmd
*133058  91446      0     0x14000      0x200    0  softnet0
db_enter() at db_enter+0x14
panic(ffffffff820c20df) at panic+0xc3
ip_output(fffffd8076b76e00,0,fffffd9c9e59e708,0,0,fffffd9c9e59e690,e4a23bf8c0204936)
 at ip_output+0xa26
udp_output(fffffd9c9e59e690,fffffd8076b76e00,fffffd8079d14b00,0) at 
udp_output+0x3be
sosend(fffffd9c9e59f000,fffffd8079d14b00,0,fffffd8076b76e00,0,0) at sosend+0x37f
pflow_output_process(ffff8000011a0800) at pflow_output_process+0x67
taskq_thread(ffff800000035200) at taskq_thread+0x100
end trace frame: 0x0, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}>

ddb{0}> show panic
*cpu0: ip_output no HDR

ddb{0}> trace
db_enter() at db_enter+0x14
panic(ffffffff820c20df) at panic+0xc3
ip_output(fffffd8076b76e00,0,fffffd9c9e59e708,0,0,fffffd9c9e59e690,e4a23bf8c0204936)
 at ip_output+0xa26
udp_output(fffffd9c9e59e690,fffffd8076b76e00,fffffd8079d14b00,0) at 
udp_output+0x3be
sosend(fffffd9c9e59f000,fffffd8079d14b00,0,fffffd8076b76e00,0,0) at sosend+0x37f
pflow_output_process(ffff8000011a0800) at pflow_output_process+0x67
taskq_thread(ffff800000035200) at taskq_thread+0x100
end trace frame: 0x0, count: -7

thanks,

.jh

Re: kernel diagnostic assertion "st->timeout == PFTM_UNLINKED" failed: file "

Reply via email to