On 19/05/24(Sun) 23:50, Vitaliy Makkoveev wrote:
> 
> 
> > On 19 May 2024, at 22:05, Anthony J. Bentley <bent...@openbsd.org> wrote:
> > 
> > Vitaliy Makkoveev writes:
> >>> On 17 May 2024, at 12:06, Stuart Henderson <s...@spacehopper.org> =
> >> wrote:
> >>> =20
> >>> There are problems with wg(4) that people with some workloads have =
> >> been
> >>> seeing after upgrading past 7.3, though looking at this thread from =
> >> when
> >>> it last came up https://marc.info/?t=3D170940892700001&r=3D1&w=3D2 I'm =
> >> not
> >>> sure if we'd be expecting to see trouble on non-MP=E2=80=A6
> >>> =20
> >> 
> >> We do. The problem is not MP related.
> >> 
> >> Antony, does the diff [1] help?
> >> 
> >> 1. https://marc.info/?l=3Dopenbsd-bugs&m=3D170980835807159&w=3D2
> > 
> > Crashes continue to occur with the same frequency after patching.
> > 
> 
> This could be vio(4) bug. Please try this [1] diff.
> 
> 1. https://marc.info/?l=openbsd-tech&m=171588941332420&w=2

The traces all point to a use-after-free in a mbuf that has been through
the wg(4) machinery.  The fact that using a SP system makes the crash
disappear points that this driver is not MP-safe and somehow there is a
race which ends up corrupting memory associated to mbufs.

> > Here are three more crashes from running with the patch. I've seen
> > identical traces with and without the patch but these were not in
> > my last email.
> > 
> > kernel: page fault trap, code=0
> > Stopped at      schedclock+0x8a:        movzbl  0x344(%rax),%r13d
> > ddb> show panic
> > the kernel did not panic
> > ddb> trace
> > schedclock(ffff8000fffeaa68) at schedclock+0x8a
> > statclock(ffffffff82529bf8,ffff80001ca32a20,0) at statclock+0x129
> > clockintr_dispatch(ffff80001ca32a20) at clockintr_dispatch+0x30d
> > clockintr(ffff80001ca32a20) at clockintr+0x59
> > intr_handler(ffff80001ca32a20,ffff8000000e6000) at intr_handler+0x3c
> > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0x1a3
> > memset() at memset+0x5c
> > end trace frame: 0x0, count: -7
> > ddb> ps
> >   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
> > 
> > 
> > panic: pr_find_pagehead: mbufpl: incorrect page
> > Stopped at      db_enter+0x14:  popq    %rbp
> >    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > db_enter() at db_enter+0x14
> > panic(ffffffff82161d70) at panic+0xb5
> > pool_do_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_do_put+0x27a
> > pool_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_put+0x53
> > m_free(fffffd8028dbf600) at m_free+0xa6
> > m_freem(fffffd8028dbf600) at m_freem+0x38
> > vio_txeof(ffff800000064118) at vio_txeof+0x12d
> > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31
> > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102
> > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65
> > intr_handler(ffff80001ca7e7f0,ffff800000073e00) at intr_handler+0x3c
> > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> > memset() at memset+0x5c
> > wg_encap_worker(ffff8000007ed000) at wg_encap_worker+0x79
> > end trace frame: 0xffff80001ca7e9f0, count: 0
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb> trace
> > db_enter() at db_enter+0x14
> > panic(ffffffff82161d70) at panic+0xb5
> > pool_do_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_do_put+0x27a
> > pool_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_put+0x53
> > m_free(fffffd8028dbf600) at m_free+0xa6
> > m_freem(fffffd8028dbf600) at m_freem+0x38
> > vio_txeof(ffff800000064118) at vio_txeof+0x12d
> > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31
> > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102
> > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65
> > intr_handler(ffff80001ca7e7f0,ffff800000073e00) at intr_handler+0x3c
> > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> > memset() at memset+0x5c
> > wg_encap_worker(ffff8000007ed000) at wg_encap_worker+0x79
> > taskq_thread(ffff80000088ac00) at taskq_thread+0xf0
> > end trace frame: 0x0, count: -15
> > ddb> show panic
> > *cpu0: pr_find_pagehead: mbufpl: incorrect page
> > ddb> ps
> >   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
> > 56587  470184  85475      0  3  0x18000083  dtread        btrace
> > 58952  222967      0     89  3  0x19100092  kqread        relayd
> > 83190  101464      0     89  3  0x19100092  kqread        relayd
> > ddb> show registers
> > rdi                              0x4
> > rsi                             0x14
> > rbp               0xffff80001ca7e4a0
> > rbx               0xfffffd8028dbf600
> > rdx                            0x3fd
> > rcx               0x4800000000000111
> > rax                             0x30
> > r8                 0x101010101010101
> > r9                                 0
> > r10               0x582c2a7821cc399f
> > r11               0xf4834d1e02cdca10
> > r12               0xfffffd8028dbf600
> > r13               0xffff800000024800
> > r14                                0
> > r15               0xffffffff82161d70    pp_r600_decoded_lanes+0xc8aa
> > rip               0xffffffff81fa1d44    db_enter+0x14
> > cs                               0x8
> > rflags                         0x282
> > rsp               0xffff80001ca7e4a0
> > ss                              0x10
> > db_enter+0x14:  popq    %rbp
> > 
> > 
> > panic: pr_find_pagehead: mbufpl: incorrect page
> > Stopped at      db_enter+0x14:  popq    %rbp
> >    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > *225925  73351      0     0x14000      0x200    0  wg_crypt
> > db_enter() at db_enter+0x14
> > panic(ffffffff82161d70) at panic+0xb5
> > pool_do_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_do_put+0x27a
> > pool_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_put+0x53
> > m_free(fffffd8035fd9400) at m_free+0xa6
> > m_freem(fffffd8035fd9400) at m_freem+0x38
> > vio_txeof(ffff800000064118) at vio_txeof+0x12d
> > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31
> > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102
> > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65
> > intr_handler(ffff80001c922500,ffff800000073e00) at intr_handler+0x3c
> > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> > memset() at memset+0x5c
> > wg_encap_worker(ffff8000007ef000) at wg_encap_worker+0x79
> > end trace frame: 0xffff80001c922700, count: 0
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb> show panic
> > *cpu0: pr_find_pagehead: mbufpl: incorrect page
> > ddb> trace
> > db_enter() at db_enter+0x14
> > panic(ffffffff82161d70) at panic+0xb5
> > pool_do_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_do_put+0x27a
> > pool_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_put+0x53
> > m_free(fffffd8035fd9400) at m_free+0xa6
> > m_freem(fffffd8035fd9400) at m_freem+0x38
> > vio_txeof(ffff800000064118) at vio_txeof+0x12d
> > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31
> > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102
> > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65
> > intr_handler(ffff80001c922500,ffff800000073e00) at intr_handler+0x3c
> > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> > memset() at memset+0x5c
> > wg_encap_worker(ffff8000007ef000) at wg_encap_worker+0x79
> > taskq_thread(ffff800000889080) at taskq_thread+0xf0
> > end trace frame: 0x0, count: -15
> > ddb> ps                                                                     
> >     
> >   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
> > 51969  144614  37729      0  2  0x18000003                btrace
> > 40841  474945  76353   1000  3   0x810008b  sigsusp       ksh
> > 76353  455143  78366   1000  3  0x18000098  kqread        sshd-session
> > 78366  500790  60748      0  3  0x18000092  kqread        sshd-session
> >  1661  483333  93900     89  3  0x19100092  kqread        relayd
> > 20971  454162  93900     89  3  0x19100092  kqread        relayd
> > 66174   90602  93900     89  3  0x19100092  kqread        relayd
> > 48738  445549  93900     89  3  0x19100092  kqread        relayd
> > 88711   54303  93900     89  3  0x19100092  kqread        relayd
> > 33085  157864  93900     89  2  0x19100012                relayd
> > 36613  263398  93900     89  3  0x19100092                relayd
> > 93900   61929      1      0  3  0x18000080  kqread        relayd
> > 58569  410836      1      0  3   0x8100083  ttyin         ksh
> > 30102  428727      1      0  3  0x18100098  kqread        cron
> > *73351  225925      0      0  7     0x14200                wg_crypt
> > 25707  237828      0      0  3     0x14200  bored         wg_handshake
> > 75251  422241      0      0  3     0x14200  bored         wg_handshake
> > 89402  219146      1    110  3  0x18100090  kqread        sndiod
> >  1652  116066      1     99  3  0x19100090  kqread        sndiod
> > 41636  131173  47944     95  3  0x19100092  kqread        smtpd
> > 56159  435661  47944    103  3  0x19100092  kqread        smtpd
> > 30864  263446  47944     95  3  0x18100092  kqread        smtpd             
> >    
> > 64861   75991  47944     95  3  0x19100092  kqread        smtpd             
> >    
> > 74399  157341  47944     95  3  0x19100092  kqread        smtpd
> > 47944  325461      1      0  3  0x18100080  kqread        smtpd
> > 60748  251840      1      0  3  0x18000088  kqread        sshd
> > 93282   26115      1      0  3  0x18100080  kqread        ntpd
> > 12262  492605  81276     83  3  0x18100092  kqread        ntpd
> > 81276  343918      1     83  2  0x19100492                ntpd
> > 24416  419389  95291     74  3  0x19100092  bpf           pflogd
> > 95291   58348      1      0  3  0x18000080  sbwait        pflogd
> > 99456   71886  56811     73  3  0x19100090  kqread        syslogd
> > 57202  274926  82913     77  3  0x18100092  kqread        dhcpleased
> > 93609  415070  82913     77  3  0x18100092  kqread        dhcpleased
> > 82913   38615      1      0  3  0x18000080  kqread        dhcpleased
> > 39413   85502  22242    115  3  0x18100092  kqread        slaacd
> > 84235  356871  22242    115  3  0x18100092  kqread        slaacd
> > 22242  283359      1      0  3  0x18100080  kqread        slaacd
> > 53776  372278      0      0  3     0x14200  bored         smr
> > 16202  188026      0      0  3     0x14200  pgzero        zerothread
> > 40368  204141      0      0  3     0x14200  aiodoned      aiodoned
> > 18183  419428      0      0  3     0x14200  syncer        update
> > 79669  281449      0      0  3     0x14200  cleaner       cleaner
> > 80971   55573      0      0  3     0x14200  reaper        reaper
> > 88433  220842      0      0  3     0x14200  pgdaemon      pagedaemon
> > 34834  242944      0      0  3     0x14200  bored         softnet3
> > 28119  493362      0      0  3     0x14200  bored         softnet2
> > 41877  463150      0      0  3     0x14200  bored         softnet1
> > 16167  354819      0      0  3     0x14200  bored         softnet0
> > 93717  296304      0      0  3     0x14200  bored         systqmp
> > 45065   39416      0      0  3     0x14200  bored         systq
> > 46106   21722      0      0  3  0x40014200  tmoslp        softclock
> > 25869  146461      0      0  3  0x40014200                idle0
> >     1  357659      0      0  3   0x8000082  wait          init
> >     0       0     -1      0  3     0x10200  scheduler     swapper
> > ddb> show registers
> > rdi                              0x4
> > rsi                             0x14
> > rbp               0xffff80001c9221b0
> > rbx               0xfffffd8035fd9400
> > rdx                            0x3fd
> > rcx               0x4800000000000111
> > rax                             0x30
> > r8                 0x101010101010101
> > r9                                 0
> > r10               0x8dd14be7a93050dc
> > r11               0xe3e5f94705a0c9e7
> > r12               0xfffffd8035fd9400
> > r13               0xffff800000024800
> > r14                                0
> > r15               0xffffffff82161d70    pp_r600_decoded_lanes+0xc8aa
> > rip               0xffffffff81fa1d44    db_enter+0x14
> > cs                               0x8
> > rflags                         0x286
> > rsp               0xffff80001c9221b0
> > ss                              0x10
> > db_enter+0x14:  popq    %rbp
> > 
> 

Reply via email to