Re: double fault trap, code=0

Giovanni Bechis Tue, 31 Jul 2018 16:01:58 -0700

On Tue, Jul 31, 2018 at 02:04:53PM -0700, Philip Guenther wrote:
> On Tue, 31 Jul 2018, [email protected] wrote:
> > >Synopsis:  Every now and then I hit ddb with double fault trap, code=0
> > >Category:  acpi
> > >Environment:
> >     System      : OpenBSD 6.3
> >     Details     : OpenBSD 6.3-current (GENERIC) #143: Fri Jul 27 04:38:01 
> > MDT 2018
> >                      
> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
> > >Description:
> >     Every couple of days I hit ddb:
> >     double fault trap, code=0
> >     Stopped at __mtx_enter+0xf:     pushq %r11
> >     ddb{0}> bt
> >     __mtx_enter(0) at __mtx_enter+0xf
> >     
> > i915_get_crtc_scanoutpos(f69da68b441aff13,ffff800000169156,ffff80000015f800,ffff80000015f800,1,0)
> >  at i915_get_crtc_scanoutpos+0xce
> >     
> > drm_calc_vbltimestamp_from_scanoutpos(6551790fe8f1e00a,0,ffff80000015f800,0,ffff80000015f800,453d)
> >  at drm_calc_vbltimestamp_from_scanoutpos+0x92
> >     drm_update_vblank_count() at drm_update_vblank_count+0x9b
> >     drm_handle_vblank() at drm_handle_vblank+0xd1
> >     ironlake_irq_handler(575039e4693defc4,ffff80000015d700) at 
> > ironlake_irq_handler+0x320
> >     intr_handler(84d668fbeb1151573,0) at intr_handler+0x68
> >     Xintr_ioapic_edge16_untramp(0,0,1,0,ffffffff81b329e8,ffffff012cdc33f0) 
> > at Xintr_ioapic_edge16_untramp+0x19f
> >     uvm_map_addr_RBT_AUGMENT(1aa311ba321d33a4) at uvm_map_addr_RBT_AUGMENT
> >     uvm_mapent_addr_remove(ffffffff81c9ba58,ffff800032d20000) at 
> > uvm_mapent_addr_remove+0x67
> >     
> > uvm_mapent_mkfree(709092b46cd75947,ffff800032d20000,ffffff012cdc33f0,ffffffff81c9ba58,ffffff012cdc3000)
> >  at uvm_mapent_mkfree+0xc9
> >     
> > uvm_unmap_remove(da20d5b12b851735,ffff800032d2000,ffffffff81c9ba58,ffff800032cb2540,ffff800032d1f000,1)
> >  at uvm_unmap_remove+0x2cf
> >     uvm_unmap(709092b46c9e3347,ffff800032d1f000,ffff800032d20000) at 
> > uvm_unmap+0x75
> >     km_free(4033f06f727571cc,514,0,1000) at km_free+0x4f
> >     _bus_space_unmap(ddf1a0675f9a0f6,1,0,ffffffff81b7aaf8) at 
> > _bus_space_unmap+0xdd
> >     acpi_gasio(ad65595461093493,0,0,ffff8000009293a0,ffff800032cb2768,1) at 
> > acpi_gasio+0x242
> >     
> > aml_opreg_sysmem_handler(14f676a0776defdb,ffff800032cb2748,ffffffff818347d0,ffff800032cb26d0,ad65595461093493)
> >  at aml_opreg_sysmem_handler+0x30
> >     
> > aml_rwgen(221071f83d28047,ffff800000929388,ffff800000062088,28a2,ffff80000048188,1)
> >  at aml_rwgen+0x650
> >     aml_rwfield(771ec935baef5db0,ffff80000075f308,69,69,ffff800000062088) 
> > at aml_rwfield+0x3a5
> >     
> > aml_eval(79dedd78a5dfabfb,ffff80000075f308,ffff80000035031,69,ffff800000062088)
> >  at aml_eval+0x1f7
> >     aml_parse(de6bf302f7589bb6,ffff80000075f308,ffff800000035021) at 
> > aml_parse+0x54
> >     ....three more pages of the last line....
> >     aml_eval(e4d65b8caee09c80,0,ffff800000089408,2,0) at aml_eval+0x323
> >     aml_evalnode(bcb11c8975c0ae9,ffff800000026400,ffff800000026400,2,0) at 
> > aml_evalnode+0xae
> >     acpi_gpe(2c80ceef08cd0301,ffff800000026400,ffff80000002bc40) at 
> > acpi_gpe+0x35
> >     acpi_thread(0) at acpi_thread+0x188
> >     end trace frame: 0x0, count: -65
> 
> And quoting a previous off-list email:
> > every now and then, starting from at least a month ago my laptop
> > enters ddb with "Double fault trap, code=0".
> > Most of the times it is in ieee80211 and at a first glance I
> > looked at iwm(4), but today it happened also with intel(4).
> 
> So it's double-faulting because it's running off the end of the kernel 
> stack for the ACPI thread due to a combination of deeply nested AML and 
> stack usage by the DRM and/or 802.11 interrupt handlers.
> 
> I don't see any recent changes in the ACPI stack which would cause a 
> change in behavior on this box (it doesn't have GenericSerialBus, or _DSD 
> properties, or an sdhc device), so either
>  a) the thange in stack consumption is from the DRM and 802.11 side, OR
>  b) did you update the BIOS around the time this started?
>       > bios0: vendor TOSHIBA version "Version 5.10" date 04/18/2018
>     Perhaps the new version uses more deeply nested AML.
> 
I do not have enough dmesg log files but it could be related to a bios update 
I had completely forgot that


> 
> For those wondering about the iwm/802.11 case, the photo previously sent 
> had the trace of the interrupt fame going, from bottom up:
> 
> -> Xintr_ioapic_edge24_untramp
> -> intr_handler
> -> iwm_intr
> -> iwm_rx_pkt
> -> iwm_rx_mpdu
> -> iwm_rx_frame
> -> ieee80211_input
> -> ieee80211_recv_probe_resp
> -> ieee80211_find_node_for_beacon
> 
> Are any of those using more stack-space than before?
> 
> 
> Not sure what we want to do here.
>  - if this did start after updating the BIOS, see if there's a newer one 
>    or maybe downgrade
There isn't an update available and a downgrade seems not possible

>  - if we can identify an increase in stack use in an interrupt path, we 
>    should fix that
>  - making aml_parse() iterative instead of recursive...by tracking frames 
>    of AML state in an explict stack...would be annoying, more complex to 
>    maintain, and probably inefficient.  Maybe it's time to let kernel 
>    threads request a larger than default stack size and have acpi_thread 
>    request another page or so?
>  - if all else fails, there's always increasing UPAGES...  <barf>
> 
> 
> Philip Guenther

Re: double fault trap, code=0

Reply via email to