Re: double fault trap, code=0

Philip Guenther Tue, 31 Jul 2018 14:05:59 -0700

On Tue, 31 Jul 2018, giova...@paclan.it wrote:
> >Synopsis:    Every now and then I hit ddb with double fault trap, code=0
> >Category:    acpi
> >Environment:
>       System      : OpenBSD 6.3
>       Details     : OpenBSD 6.3-current (GENERIC) #143: Fri Jul 27 04:38:01 
> MDT 2018
>                        
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> >Description:
>       Every couple of days I hit ddb:
>       double fault trap, code=0
>       Stopped at __mtx_enter+0xf:     pushq %r11
>       ddb{0}> bt
>       __mtx_enter(0) at __mtx_enter+0xf
>       
> i915_get_crtc_scanoutpos(f69da68b441aff13,ffff800000169156,ffff80000015f800,ffff80000015f800,1,0)
>  at i915_get_crtc_scanoutpos+0xce
>       
> drm_calc_vbltimestamp_from_scanoutpos(6551790fe8f1e00a,0,ffff80000015f800,0,ffff80000015f800,453d)
>  at drm_calc_vbltimestamp_from_scanoutpos+0x92
>       drm_update_vblank_count() at drm_update_vblank_count+0x9b
>       drm_handle_vblank() at drm_handle_vblank+0xd1
>       ironlake_irq_handler(575039e4693defc4,ffff80000015d700) at 
> ironlake_irq_handler+0x320
>       intr_handler(84d668fbeb1151573,0) at intr_handler+0x68
>       Xintr_ioapic_edge16_untramp(0,0,1,0,ffffffff81b329e8,ffffff012cdc33f0) 
> at Xintr_ioapic_edge16_untramp+0x19f
>       uvm_map_addr_RBT_AUGMENT(1aa311ba321d33a4) at uvm_map_addr_RBT_AUGMENT
>       uvm_mapent_addr_remove(ffffffff81c9ba58,ffff800032d20000) at 
> uvm_mapent_addr_remove+0x67
>       
> uvm_mapent_mkfree(709092b46cd75947,ffff800032d20000,ffffff012cdc33f0,ffffffff81c9ba58,ffffff012cdc3000)
>  at uvm_mapent_mkfree+0xc9
>       
> uvm_unmap_remove(da20d5b12b851735,ffff800032d2000,ffffffff81c9ba58,ffff800032cb2540,ffff800032d1f000,1)
>  at uvm_unmap_remove+0x2cf
>       uvm_unmap(709092b46c9e3347,ffff800032d1f000,ffff800032d20000) at 
> uvm_unmap+0x75
>       km_free(4033f06f727571cc,514,0,1000) at km_free+0x4f
>       _bus_space_unmap(ddf1a0675f9a0f6,1,0,ffffffff81b7aaf8) at 
> _bus_space_unmap+0xdd
>       acpi_gasio(ad65595461093493,0,0,ffff8000009293a0,ffff800032cb2768,1) at 
> acpi_gasio+0x242
>       
> aml_opreg_sysmem_handler(14f676a0776defdb,ffff800032cb2748,ffffffff818347d0,ffff800032cb26d0,ad65595461093493)
>  at aml_opreg_sysmem_handler+0x30
>       
> aml_rwgen(221071f83d28047,ffff800000929388,ffff800000062088,28a2,ffff80000048188,1)
>  at aml_rwgen+0x650
>       aml_rwfield(771ec935baef5db0,ffff80000075f308,69,69,ffff800000062088) 
> at aml_rwfield+0x3a5
>       
> aml_eval(79dedd78a5dfabfb,ffff80000075f308,ffff80000035031,69,ffff800000062088)
>  at aml_eval+0x1f7
>       aml_parse(de6bf302f7589bb6,ffff80000075f308,ffff800000035021) at 
> aml_parse+0x54
>       ....three more pages of the last line....
>       aml_eval(e4d65b8caee09c80,0,ffff800000089408,2,0) at aml_eval+0x323
>       aml_evalnode(bcb11c8975c0ae9,ffff800000026400,ffff800000026400,2,0) at 
> aml_evalnode+0xae
>       acpi_gpe(2c80ceef08cd0301,ffff800000026400,ffff80000002bc40) at 
> acpi_gpe+0x35
>       acpi_thread(0) at acpi_thread+0x188
>       end trace frame: 0x0, count: -65


And quoting a previous off-list email:
> every now and then, starting from at least a month ago my laptop
> enters ddb with "Double fault trap, code=0".
> Most of the times it is in ieee80211 and at a first glance I
> looked at iwm(4), but today it happened also with intel(4).

So it's double-faulting because it's running off the end of the kernel 
stack for the ACPI thread due to a combination of deeply nested AML and 
stack usage by the DRM and/or 802.11 interrupt handlers.

I don't see any recent changes in the ACPI stack which would cause a 
change in behavior on this box (it doesn't have GenericSerialBus, or _DSD 
properties, or an sdhc device), so either
 a) the thange in stack consumption is from the DRM and 802.11 side, OR
 b) did you update the BIOS around the time this started?
        > bios0: vendor TOSHIBA version "Version 5.10" date 04/18/2018
    Perhaps the new version uses more deeply nested AML.


For those wondering about the iwm/802.11 case, the photo previously sent 
had the trace of the interrupt fame going, from bottom up:

-> Xintr_ioapic_edge24_untramp
-> intr_handler
-> iwm_intr
-> iwm_rx_pkt
-> iwm_rx_mpdu
-> iwm_rx_frame
-> ieee80211_input
-> ieee80211_recv_probe_resp
-> ieee80211_find_node_for_beacon

Are any of those using more stack-space than before?


Not sure what we want to do here.
 - if this did start after updating the BIOS, see if there's a newer one 
   or maybe downgrade
 - if we can identify an increase in stack use in an interrupt path, we 
   should fix that
 - making aml_parse() iterative instead of recursive...by tracking frames 
   of AML state in an explict stack...would be annoying, more complex to 
   maintain, and probably inefficient.  Maybe it's time to let kernel 
   threads request a larger than default stack size and have acpi_thread 
   request another page or so?
 - if all else fails, there's always increasing UPAGES...  <barf>


Philip Guenther

Re: double fault trap, code=0

Reply via email to