Hi,

I think the problem is the thread pointed to by tdwait exited. I would say it is not allowed to peek into the other records threads, because they may change under the hood and are not protected by the current context.


        if (record->er_cpuid != curcpu) {

This optimisation is invalid or needs to be revisited:
                /*
                 * If the head of the list is running, we can wait for it
                 * to remove itself from the list and thus save us the
                 * overhead of a migration
                 */
                if ((tdwait = TAILQ_FIRST(&record->er_tdlist)) != NULL &&
                    TD_IS_RUNNING(tdwait->et_td)) {
                        gen = record->er_gen;
                        thread_unlock(td);
                        do {
                                cpu_spinwait();
                        } while (tdwait == TAILQ_FIRST(&record->er_tdlist) &&
                            gen == record->er_gen && TD_IS_RUNNING(tdwait->et_td) 
&&
                            spincount++ < MAX_ADAPTIVE_SPIN);
                        thread_lock(td);
                        return;
                }


--HPS

On 08/05/18 22:01, Matthew Macy wrote:
If you could give me a self-contained reproducer that would expedite a fix.

Thanks.
-M

On Sun, Aug 5, 2018 at 08:36 Roman Bogorodskiy <no...@freebsd.org> wrote:

Running -CURRENT r336863 on amd64. Get the following panic right after
(or during) boot:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0xdeadc2ff
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bd7858
stack pointer           = 0x28:0xfffffe008b445580
frame pointer           = 0x28:0xfffffe008b4455c0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 903 (libvirtd)

Traceback is:

(kgdb) #0  doadump (textdump=0) at pcpu.h:230
#1  0xffffffff8043dc7b in db_dump (dummy=<value optimized out>,
     dummy2=<value optimized out>, dummy3=<value optimized out>,
     dummy4=<value optimized out>) at /usr/src/sys/ddb/db_command.c:574
#2  0xffffffff8043da49 in db_command (cmd_table=<value optimized out>)
     at /usr/src/sys/ddb/db_command.c:481
#3  0xffffffff8043d7c4 in db_command_loop ()
     at /usr/src/sys/ddb/db_command.c:534
#4  0xffffffff804409ef in db_trap (type=<value optimized out>,
     code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:252
#5  0xffffffff80bdd513 in kdb_trap (type=12, code=0, tf=<value optimized
out>)
     at /usr/src/sys/kern/subr_kdb.c:693
#6  0xffffffff810769f1 in trap_fatal (frame=0xfffffe008b4454c0,
eva=3735929599)
     at /usr/src/sys/amd64/amd64/trap.c:884
#7  0xffffffff81076b12 in trap_pfault (frame=0xfffffe008b4454c0,
     usermode=<value optimized out>) at pcpu.h:230
#8  0xffffffff8107611a in trap (frame=0xfffffe008b4454c0)
     at /usr/src/sys/amd64/amd64/trap.c:427
#9  0xffffffff810518ac in calltrap ()
     at /usr/src/sys/amd64/amd64/exception.S:230
#10 0xffffffff80bd7858 in epoch_block_handler_preempt (
     global=<value optimized out>, cr=0xfffffe00760c3a00,
     arg=<value optimized out>) at /usr/src/sys/kern/subr_epoch.c:256
#11 0xffffffff803994fd in ck_epoch_synchronize_wait (
     global=0xfffff800030c5680,
     cb=0xffffffff80bd77a0 <epoch_block_handler_preempt>, ct=0x0)
     at /usr/src/sys/contrib/ck/src/ck_epoch.c:407
#12 0xffffffff80bd7630 in epoch_wait_preempt (epoch=0xfffff800030c5680)
     at /usr/src/sys/kern/subr_epoch.c:389
#13 0xffffffff80c983bf in if_delgroup (ifp=0xfffff80003aab800,
     groupname=0xfffff80005ff5e00 "bridge") at /usr/src/sys/net/if.c:1514
#14 0xffffffff80c9f2b2 in if_clone_destroyif (ifc=0xfffff80005ff5e00,
     ifp=0xfffff80003aab800) at /usr/src/sys/net/if_clone.c:325
#15 0xffffffff80c9f0d5 in if_clone_destroy (name=0xfffffe008b4458d0
"virbr0")
     at /usr/src/sys/net/if_clone.c:288
#16 0xffffffff80c9a2c3 in ifioctl (so=0xfffff80007edca38, cmd=2149607801,
     data=<value optimized out>, td=<value optimized out>)
     at /usr/src/sys/net/if.c:3053
#17 0xffffffff80c04259 in kern_ioctl (td=0xfffff80007c1a580,
     fd=<value optimized out>, com=<value optimized out>,
     data=<value optimized out>) at file.h:330
#18 0xffffffff80c03f2e in sys_ioctl (td=0xfffff80007c1a580,
     uap=0xfffff80007c1a940) at /usr/src/sys/kern/sys_generic.c:712
#19 0xffffffff81077401 in amd64_syscall (td=0xfffff80007c1a580, traced=0)
     at subr_syscall.c:135
#20 0xffffffff8105218d in fast_syscall_common ()
     at /usr/src/sys/amd64/amd64/exception.S:500
#21 0x00000008028f4c0a in ?? ()


Previous frame inner to this frame (corrupt stack?)


Current language:  auto; currently minimal


(kgdb)

It looks like panic happens during network interfaces related
operations. Couple of dmesg lines before panic:

Aug  5 19:02:42 romashka rtsold[585]: <rtsock_input_ifannounce> interface
bridge0 removed
Aug  5 19:02:42 romashka kernel: bridge0: Ethernet address:
02:af:41:48:c7:00
Aug  5 19:02:42 romashka kernel: bridge0: changing name to 'virbr-ab'
Aug  5 19:02:42 romashka kernel: tap0: Ethernet address: 00:bd:8d:11:f7:00
Aug  5 19:02:42 romashka kernel: tap0: link state changed to UP
Aug  5 19:02:42 romashka kernel: tap0: changing name to 'virbr-ab-nic'
Aug  5 19:02:42 romashka kernel: virbr-ab-nic: promiscuous mode enabled
Aug  5 19:02:42 romashka kernel: virbr-ab: link state changed to UP
Aug  5 19:02:42 romashka rtsold[585]: <rtsock_input_ifannounce> interface
tap0 removed
Aug  5 19:02:43 romashka dnsmasq[1047]: setting --bind-interfaces option
because of OS limitations
Aug  5 19:02:43 romashka dnsmasq[1047]: warning: no upstream servers
configured
Aug  5 19:02:43 romashka kernel: virbr-ab-nic: link state changed to DOWN
Aug  5 19:02:43 romashka kernel: virbr-ab: link state changed to DOWN
Aug  5 19:02:43 romashka kernel: bridge1: Ethernet address:
02:af:41:48:c7:01
Aug  5 19:02:43 romashka kernel: bridge1: changing name to 'virbr0'
Aug  5 19:02:43 romashka rtsold[585]: <rtsock_input_ifannounce> interface
bridge1 removed
Aug  5 19:02:43 romashka kernel: tap1: Ethernet address: 00:bd:53:14:f7:01
Aug  5 19:02:43 romashka kernel: tap1: link state changed to UP
Aug  5 19:02:43 romashka kernel: tap1: changing name to 'virbr0-nic'
Aug  5 19:02:43 romashka kernel: virbr0: link state changed to UP
Aug  5 19:02:43 romashka kernel: virbr0-nic: promiscuous mode enabled
Aug  5 19:02:43 romashka rtsold[585]: <rtsock_input_ifannounce> interface
tap1 removed
Aug  5 19:05:03 romashka syslogd: kernel boot file is /boot/kernel/kernel
Aug  5 19:05:03 romashka kernel:
Aug  5 19:05:03 romashka syslogd: last message repeated 1 times
Aug  5 19:05:03 romashka kernel: Fatal trap 12: page fault while in kernel
mode

If I disable libvirt service, system completes booting fine. What it
tries to do on start, it creates a couple of bridge(4) and tap(4)
devices, adds tap devices to bridges it created, and possibly destroy
these interfaces in case of errors. It also starts dnsmasq on some of
these interfaces.

This problem started to appear about 2-4 weeks ago.

Roman Bogorodskiy

_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to