On Tue, 31 Jul 2018, giova...@paclan.it wrote: > >Synopsis: Every now and then I hit ddb with double fault trap, code=0 > >Category: acpi > >Environment: > System : OpenBSD 6.3 > Details : OpenBSD 6.3-current (GENERIC) #143: Fri Jul 27 04:38:01 > MDT 2018 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > >Description: > Every couple of days I hit ddb: > double fault trap, code=0 > Stopped at __mtx_enter+0xf: pushq %r11 > ddb{0}> bt > __mtx_enter(0) at __mtx_enter+0xf > > i915_get_crtc_scanoutpos(f69da68b441aff13,ffff800000169156,ffff80000015f800,ffff80000015f800,1,0) > at i915_get_crtc_scanoutpos+0xce > > drm_calc_vbltimestamp_from_scanoutpos(6551790fe8f1e00a,0,ffff80000015f800,0,ffff80000015f800,453d) > at drm_calc_vbltimestamp_from_scanoutpos+0x92 > drm_update_vblank_count() at drm_update_vblank_count+0x9b > drm_handle_vblank() at drm_handle_vblank+0xd1 > ironlake_irq_handler(575039e4693defc4,ffff80000015d700) at > ironlake_irq_handler+0x320 > intr_handler(84d668fbeb1151573,0) at intr_handler+0x68 > Xintr_ioapic_edge16_untramp(0,0,1,0,ffffffff81b329e8,ffffff012cdc33f0) > at Xintr_ioapic_edge16_untramp+0x19f > uvm_map_addr_RBT_AUGMENT(1aa311ba321d33a4) at uvm_map_addr_RBT_AUGMENT > uvm_mapent_addr_remove(ffffffff81c9ba58,ffff800032d20000) at > uvm_mapent_addr_remove+0x67 > > uvm_mapent_mkfree(709092b46cd75947,ffff800032d20000,ffffff012cdc33f0,ffffffff81c9ba58,ffffff012cdc3000) > at uvm_mapent_mkfree+0xc9 > > uvm_unmap_remove(da20d5b12b851735,ffff800032d2000,ffffffff81c9ba58,ffff800032cb2540,ffff800032d1f000,1) > at uvm_unmap_remove+0x2cf > uvm_unmap(709092b46c9e3347,ffff800032d1f000,ffff800032d20000) at > uvm_unmap+0x75 > km_free(4033f06f727571cc,514,0,1000) at km_free+0x4f > _bus_space_unmap(ddf1a0675f9a0f6,1,0,ffffffff81b7aaf8) at > _bus_space_unmap+0xdd > acpi_gasio(ad65595461093493,0,0,ffff8000009293a0,ffff800032cb2768,1) at > acpi_gasio+0x242 > > aml_opreg_sysmem_handler(14f676a0776defdb,ffff800032cb2748,ffffffff818347d0,ffff800032cb26d0,ad65595461093493) > at aml_opreg_sysmem_handler+0x30 > > aml_rwgen(221071f83d28047,ffff800000929388,ffff800000062088,28a2,ffff80000048188,1) > at aml_rwgen+0x650 > aml_rwfield(771ec935baef5db0,ffff80000075f308,69,69,ffff800000062088) > at aml_rwfield+0x3a5 > > aml_eval(79dedd78a5dfabfb,ffff80000075f308,ffff80000035031,69,ffff800000062088) > at aml_eval+0x1f7 > aml_parse(de6bf302f7589bb6,ffff80000075f308,ffff800000035021) at > aml_parse+0x54 > ....three more pages of the last line.... > aml_eval(e4d65b8caee09c80,0,ffff800000089408,2,0) at aml_eval+0x323 > aml_evalnode(bcb11c8975c0ae9,ffff800000026400,ffff800000026400,2,0) at > aml_evalnode+0xae > acpi_gpe(2c80ceef08cd0301,ffff800000026400,ffff80000002bc40) at > acpi_gpe+0x35 > acpi_thread(0) at acpi_thread+0x188 > end trace frame: 0x0, count: -65
And quoting a previous off-list email: > every now and then, starting from at least a month ago my laptop > enters ddb with "Double fault trap, code=0". > Most of the times it is in ieee80211 and at a first glance I > looked at iwm(4), but today it happened also with intel(4). So it's double-faulting because it's running off the end of the kernel stack for the ACPI thread due to a combination of deeply nested AML and stack usage by the DRM and/or 802.11 interrupt handlers. I don't see any recent changes in the ACPI stack which would cause a change in behavior on this box (it doesn't have GenericSerialBus, or _DSD properties, or an sdhc device), so either a) the thange in stack consumption is from the DRM and 802.11 side, OR b) did you update the BIOS around the time this started? > bios0: vendor TOSHIBA version "Version 5.10" date 04/18/2018 Perhaps the new version uses more deeply nested AML. For those wondering about the iwm/802.11 case, the photo previously sent had the trace of the interrupt fame going, from bottom up: -> Xintr_ioapic_edge24_untramp -> intr_handler -> iwm_intr -> iwm_rx_pkt -> iwm_rx_mpdu -> iwm_rx_frame -> ieee80211_input -> ieee80211_recv_probe_resp -> ieee80211_find_node_for_beacon Are any of those using more stack-space than before? Not sure what we want to do here. - if this did start after updating the BIOS, see if there's a newer one or maybe downgrade - if we can identify an increase in stack use in an interrupt path, we should fix that - making aml_parse() iterative instead of recursive...by tracking frames of AML state in an explict stack...would be annoying, more complex to maintain, and probably inefficient. Maybe it's time to let kernel threads request a larger than default stack size and have acpi_thread request another page or so? - if all else fails, there's always increasing UPAGES... <barf> Philip Guenther