On Tue, 31 Jul 2018, [email protected] wrote:
> >Synopsis: Every now and then I hit ddb with double fault trap, code=0
> >Category: acpi
> >Environment:
> System : OpenBSD 6.3
> Details : OpenBSD 6.3-current (GENERIC) #143: Fri Jul 27 04:38:01
> MDT 2018
>
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
> >Description:
> Every couple of days I hit ddb:
> double fault trap, code=0
> Stopped at __mtx_enter+0xf: pushq %r11
> ddb{0}> bt
> __mtx_enter(0) at __mtx_enter+0xf
>
> i915_get_crtc_scanoutpos(f69da68b441aff13,ffff800000169156,ffff80000015f800,ffff80000015f800,1,0)
> at i915_get_crtc_scanoutpos+0xce
>
> drm_calc_vbltimestamp_from_scanoutpos(6551790fe8f1e00a,0,ffff80000015f800,0,ffff80000015f800,453d)
> at drm_calc_vbltimestamp_from_scanoutpos+0x92
> drm_update_vblank_count() at drm_update_vblank_count+0x9b
> drm_handle_vblank() at drm_handle_vblank+0xd1
> ironlake_irq_handler(575039e4693defc4,ffff80000015d700) at
> ironlake_irq_handler+0x320
> intr_handler(84d668fbeb1151573,0) at intr_handler+0x68
> Xintr_ioapic_edge16_untramp(0,0,1,0,ffffffff81b329e8,ffffff012cdc33f0)
> at Xintr_ioapic_edge16_untramp+0x19f
> uvm_map_addr_RBT_AUGMENT(1aa311ba321d33a4) at uvm_map_addr_RBT_AUGMENT
> uvm_mapent_addr_remove(ffffffff81c9ba58,ffff800032d20000) at
> uvm_mapent_addr_remove+0x67
>
> uvm_mapent_mkfree(709092b46cd75947,ffff800032d20000,ffffff012cdc33f0,ffffffff81c9ba58,ffffff012cdc3000)
> at uvm_mapent_mkfree+0xc9
>
> uvm_unmap_remove(da20d5b12b851735,ffff800032d2000,ffffffff81c9ba58,ffff800032cb2540,ffff800032d1f000,1)
> at uvm_unmap_remove+0x2cf
> uvm_unmap(709092b46c9e3347,ffff800032d1f000,ffff800032d20000) at
> uvm_unmap+0x75
> km_free(4033f06f727571cc,514,0,1000) at km_free+0x4f
> _bus_space_unmap(ddf1a0675f9a0f6,1,0,ffffffff81b7aaf8) at
> _bus_space_unmap+0xdd
> acpi_gasio(ad65595461093493,0,0,ffff8000009293a0,ffff800032cb2768,1) at
> acpi_gasio+0x242
>
> aml_opreg_sysmem_handler(14f676a0776defdb,ffff800032cb2748,ffffffff818347d0,ffff800032cb26d0,ad65595461093493)
> at aml_opreg_sysmem_handler+0x30
>
> aml_rwgen(221071f83d28047,ffff800000929388,ffff800000062088,28a2,ffff80000048188,1)
> at aml_rwgen+0x650
> aml_rwfield(771ec935baef5db0,ffff80000075f308,69,69,ffff800000062088)
> at aml_rwfield+0x3a5
>
> aml_eval(79dedd78a5dfabfb,ffff80000075f308,ffff80000035031,69,ffff800000062088)
> at aml_eval+0x1f7
> aml_parse(de6bf302f7589bb6,ffff80000075f308,ffff800000035021) at
> aml_parse+0x54
> ....three more pages of the last line....
> aml_eval(e4d65b8caee09c80,0,ffff800000089408,2,0) at aml_eval+0x323
> aml_evalnode(bcb11c8975c0ae9,ffff800000026400,ffff800000026400,2,0) at
> aml_evalnode+0xae
> acpi_gpe(2c80ceef08cd0301,ffff800000026400,ffff80000002bc40) at
> acpi_gpe+0x35
> acpi_thread(0) at acpi_thread+0x188
> end trace frame: 0x0, count: -65
And quoting a previous off-list email:
> every now and then, starting from at least a month ago my laptop
> enters ddb with "Double fault trap, code=0".
> Most of the times it is in ieee80211 and at a first glance I
> looked at iwm(4), but today it happened also with intel(4).
So it's double-faulting because it's running off the end of the kernel
stack for the ACPI thread due to a combination of deeply nested AML and
stack usage by the DRM and/or 802.11 interrupt handlers.
I don't see any recent changes in the ACPI stack which would cause a
change in behavior on this box (it doesn't have GenericSerialBus, or _DSD
properties, or an sdhc device), so either
a) the thange in stack consumption is from the DRM and 802.11 side, OR
b) did you update the BIOS around the time this started?
> bios0: vendor TOSHIBA version "Version 5.10" date 04/18/2018
Perhaps the new version uses more deeply nested AML.
For those wondering about the iwm/802.11 case, the photo previously sent
had the trace of the interrupt fame going, from bottom up:
-> Xintr_ioapic_edge24_untramp
-> intr_handler
-> iwm_intr
-> iwm_rx_pkt
-> iwm_rx_mpdu
-> iwm_rx_frame
-> ieee80211_input
-> ieee80211_recv_probe_resp
-> ieee80211_find_node_for_beacon
Are any of those using more stack-space than before?
Not sure what we want to do here.
- if this did start after updating the BIOS, see if there's a newer one
or maybe downgrade
- if we can identify an increase in stack use in an interrupt path, we
should fix that
- making aml_parse() iterative instead of recursive...by tracking frames
of AML state in an explict stack...would be annoying, more complex to
maintain, and probably inefficient. Maybe it's time to let kernel
threads request a larger than default stack size and have acpi_thread
request another page or so?
- if all else fails, there's always increasing UPAGES... <barf>
Philip Guenther