Hello *,
after experimenting with different MTU sizes and pf normalisation rules,
I am getting the feeling of a root cause lying somewhere near path MTU
discovery - perhaps in combination with IPsec.
These are the console log messages of another crash observed meanwhile:
kernel: double fault trap, code=0
Stopped at rtable_l2+0xf: pushq %rdi
ddb{0}> trace
rtable_l2(0) at rtable_l2+0xf
pf_setup_pdesc(ffff8000210e40a8,2,2,ffff80000016c400,fffffd806ee32e00,fffff80000210e41be)
at pf_setup_pdesc+0x7d
pf_test(2,2,ffff80000013f000,ffff8000210e4290) at pf_test+0xfe
ip_output(fffffd806ee32e00,0,fffffd807d95a5f8,800,0,fffffd807d95a588) at
ip_output+0x7cf
tcp_output(ffff800000551980) at tcp_output+0x15c1
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
tcp_output(ffff800000551980) at tcp_output+0x1914
[... some identical lines omitted...]
tcp_timer_rexmt(ffff800000551980) at tcp_timer_rexmt+0x3f5
softclock_thread(ffff8000210d2c58) at softclock_thread+0xfb
end trace frame: 0x0, count: -50
While the first lines differ, the tcp_output(...) and tcp_timer_rexmt(...)
and softclock_thread(...) stay always the same.
At the moment, by reducing the MTU of my vio0 interface to 1488 bytes and
attempting to clear DF flags on packages related to IPsec payload traffic
(/etc/pf.conf snippet: "match on enc0 scrub (max-mss 1360 random-id no-df)"),
I managed to delay crashes from ~ 30 minutes up to some hours in productive
use scenarios. Again, there is no problem if the machine is running idle.
Since these stalls keep happening and I am out of ideas by now, I wonder
if anybody is successfully running a Squid upstream proxy in combination
with an IPsec site-to-site connection on the same machine.
Thanks, and best regards,
Peter Müller
> Hello Alexander,
>
> thank you for your reply. Is there anything I can do about this
> like modifying configurations or provide further information?
>
> Thanks, and best regards,
> Peter Müller
>
>
>> On Fri, Jan 31, 2020 at 03:21:00PM +0000, Peter M??ller wrote:
>>> tcp_output(ffff800000584ee0) at tcp_output+0x1941
>>> tcp_output(ffff800000584ee0) at tcp_output+0x1941
>>> tcp_output(ffff800000584ee0) at tcp_output+0x1941
>>
>> Looks like stack exhaustion. tcp_output() calls tcp_mtudisc() calls
>> tcp_output().
>>
>> /usr/src/sys/netinet/tcp_output.c:1084
>>
>> if (error == EMSGSIZE) {
>> /*
>> * ip_output() will have already fixed the route
>> * for us. tcp_mtudisc() will, as its last action,
>> * initiate retransmission, so it is important to
>> * not do so here.
>> */
>> tcp_mtudisc(tp->t_inpcb, -1);
>> return (0);
>> }
>>
>> bluhm
>>
>