On 22.11.2022. 18:48, Josmar Pierri wrote:
> I upgraded to 7.2 snapshot #849 early this morning, but it crashed
> twice in a few hours.
> This time, however, the panic message is different:
> 

Could you compile kernel with this diff
https://www.mail-archive.com/[email protected]/msg72582.html

at least for me, that diff makes my firewall stable..




> uvm_fault(0xffffffff8236dcb8, 0x17, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at         pfsync_q_del+0x96:    movq      %rdx,0x8(%rax)
>     TID       PID      UID      PRFLAGS      PFLAGS   CPU   COMMAND
>  436110  83038      0       0x14000          0x200          3     softnet
>  395295  39926      0       0x14000          0x200          0     softnet
>  189958   2208      0       0x14000          0x200          2     softnet
> * 65839    5423      0       0x14000          0x200          1     systqmp
> pfsync_q_del(fffffd8401d63890) at pfsync_q_del+0x96
> pfsync_delete_state(fffffd8401d63890) at pfsync_delete_state+0x118
> pf_remove_state(fffffd8401d63890) at pfsync_remove_state+0x14b
> pf_purge_expired_states(4031,40) at pf_purge_expired_states+0x242
> pf_purge_states(0) at pf_purge_states+0x1c
> taskq_thread(ffffffff822a1a10) at taskq_thread+0x100
> end trace frame: 0x0, count: 9
> 
> This is all I could manage to get since the crash happened when I was
> away (and that stupid Dell console timeout when idle, removing the USB
> keyboard)
> 
> I observed a thing that may or may not be related to this issue: The
> "output fail" counter keeps steadily increasing both on aggregate and
> the two member interfaces:
> 
> :~# netstat -i -I aggr0
> Name    Mtu   Network     Address              Ipkts Ifail    Opkts Ofail 
> Colls
> aggr0   9200  <Link>      fe:e1:ba:d0:91:13 224426940     0 200785282
>  357     0
> 
> At first I thought it could be something related to the switches but I
> still haven't found anything wrong with them.
> 
> 
> 
> On Mon, Nov 21, 2022 at 1:22 PM Hrvoje Popovski <[email protected]> wrote:
>>
>> On 21.11.2022. 16:04, Josmar Pierri wrote:
>>> Hi,
>>>
>>> I managed to get screenshots of a random kernel panic that we are
>>> having on a server here.
>>> They were taken using a console management tool embedded into the
>>> server (Dell IDRAC) and are PNG images of the panic itself, trace of
>>> all cpus and ps.
>>> I'm not attaching them here right now because I don't know how the
>>> list would react to them.
>>>
>>> I attached the output of:
>>> 1 - sendbug -P
>>> 2 - dmesg right after reboot
>>> 3 - dmesg-boot
>>>
>>> This server has an aggr0 grouping bnxt0 and bnxt1, both at 10 Gbps.
>>> Its task is to load-balance RDP traffic (TCP 3389) among 2 large pools
>>> (more than 50 servers on each one) and 3 small ones using pf (tables)
>>> for that.
>>>
>>> These panics happen at random times without an apparent cause.
>>>
>>> The panic message reads:
>>>
>>> ddb{3}> show panic
>>> *cpu3: kernel diagnostic assertion "st->snapped == 0" failed: file
>>> "/usr/src/sys/net/if_pfsync.c", line 1591
>>>  cpu2: kernel diagnostic assertion "st->snapped == 0" failed: file
>>> "/usr/src/sys/net/if_pfsync.c", line 1591
>>>  cpu1: kernel diagnostic assertion "st->snapped == 0" failed: file
>>> "/usr/src/sys/net/if_pfsync.c", line 1591
>>> ddb{3}>
>>>
>>> Please advise how I should proceed to submit the screenshots.
>>
>> Hi,
>>
>> I have similar setup with aggr grouping ix0 and ix1 and pfsync. If you
>> have two firewalls, can you sysupgrade this one to latest snapshot ?
>>
>> I'm running snapshot after last hackathon with this diff
>> https://www.mail-archive.com/[email protected]/msg72582.html
>>
>> and for now firewall seems to work just fine.
>>
>>
>>
> 

Reply via email to