---- On Tue, 27 Dec 2016 12:51:37 -0800 Christian König 
<[email protected]> wrote ---- 
 > It's a well known problem that the completion interrupts are notorious 
 > unreliable.
 > 
 > That's why we have a fallback timer in amdgpu_fence.c which kicks an 
 > extra hardware probe after a certain timeout. Please double check that 
 > this one is working as expected.

I'm digging in to why the fallback process isn't signalling the straggling 
fences. 


        do {
                last_seq = atomic_read(&ring->fence_drv.last_seq);
                seq = amdgpu_fence_read(ring);

        } while (atomic_cmpxchg(&drv->last_seq, last_seq, seq) != last_seq);

        if (seq != ring->fence_drv.sync_seq) {
                printf("rescheduling fallback for %s\n", ring->name);
                amdgpu_fence_schedule_fallback(ring);
        }
        if (unlikely(seq == last_seq)) {
                printf("seek == last_seq == %u skipping fence_process\n", seq);
                return;
        }
Dec 28 00:22:31 daleks kernel: &fence->finished at 79042060348 f 353#2026: 
signaled from irq context
Dec 28 00:22:31 daleks kernel: fence at 79042062972 f 0#4598: signaled from 
process context
Dec 28 00:22:31 daleks kernel: &fence->scheduled at 79042069573 f 74#2353: 
signaled from irq context
Dec 28 00:22:31 daleks kernel: skipping fallback scheduling for gfx
Dec 28 00:22:31 daleks kernel: &fence->finished at 79042112606 f 75#2353: 
signaled from irq context
Dec 28 00:22:31 daleks kernel: fence at 79042115268 f 0#4599: signaled from 
process context
Dec 28 00:22:31 daleks kernel: &fence->scheduled at 79042168961 f 352#2027: 
signaled from irq context
Dec 28 00:22:31 daleks kernel: skipping fallback scheduling for gfx
Dec 28 00:22:31 daleks kernel: &fence->finished at 79042234434 f 353#2027: 
signaled from irq context
Dec 28 00:22:31 daleks kernel: fence at 79042237108 f 0#4600: signaled from 
process context
Dec 28 00:22:31 daleks kernel: 353#2028 sleeping tid 100721 at 79042673751
Dec 28 00:22:31 daleks kernel: running fence fallback for sdma0
Dec 28 00:22:31 daleks kernel: seek == last_seq == 607 skipping fence_process
Dec 28 00:22:31 daleks kernel: running fence fallback for gfx
Dec 28 00:22:31 daleks kernel: seek == last_seq == 4600 skipping fence_process


It looks like the sequence numbers are saying that the device did in fact 
complete? Too tired to think about it further now.

 > 
 > Another possibility is that the memory where the fence is written 
 > doesn't has the proper attributes (e.g. USWC vs. cached vs. uncached).

The only places where I see I memory attributes being set is in 
amdgpu_device_init for rmmio and the doorbell bar mapping in 
amdgpu_doorbell_init. The ioremap function will remap the memory uncacheable. 
The driver is unmodified from Linus' tree as of "drm/amdgpu: add gart recovery 
by gtt list V2" - about two thirds of the way through 4.9-rc1 (modulo git merge 
issues). Is there any place else I should be looking? Turning on INVARIANTS 
which scribbles memory on free (and thus aggressively flushing the cache) 
causes the hangs to take much much longer to occur - leading me to believe that 
it may well be a memory typing issue.
 

Thanks for getting back to me. 

-M

P.S.

A bit of a tangeent - but maybe you could also clarify if I'm doing something 
wrong when replaying commits from Linus' tree. The way I get the changesets and 
the sequence is by doing:
% git format-patch v4.8..v4.9-rc1 drivers/gpu/drm/*.* drivers/gpu/drm/i915 
drivers/gpu/drm/amd drivers/gpu/drm/radeon include/drm include/uapi/drm

'git am' fails much of the time even when there aren't conflicts so what I do 
is I git cherry-pick the changesets in the order that they show up in the 
generated patches. I frequently end up with empty commits and sometimes the 
drivers will not end up with all the requisite changes merged in such that it 
doesn't compile.




 > Regards,
 > Christian.
 > 
 > Am 26.12.2016 um 02:54 schrieb Matthew Macy:
 > > I'm running an rx460 using the amdgpu driver from Linux 4.8 with Mesa 
 > > 13/LLVM 3.9 and Xorg 1.18 on FreeBSD. It seems to largely perform pretty 
 > > well.
 > >
 > > However, ever since I got Mesa working I will inevitably end up losing 
 > > completion interrupts after X has been running for a brief period. I can 
 > > bring the problem on more quickly by running glxgears with vblank_mode=0. 
 > > It's a safe bet that the problem with the linuxkpi. However, since this 
 > > bug is manifesting itself in a very hardware specific way I'm coming here 
 > > for advice on what I can do to dump device state to better understand why 
 > > it ceases to fire interrupts.
 > >
 > > I enabled FENCE_TRACE and added some logging to fence creation and 
 > > fence_default_wait as well. The last interrupt in this particular excerpt 
 > > is:
 > >
 > > "Dec 22 22:36:22 daleks kernel: fence at 210850477167 f 0#233745: signaled 
 > > from irq context"
 > >
 > > amdgpu_cs_wait goes on to sleep on 411#116530 and never wake up. Any 
 > > guidance would be much appreciated. Thanks in advance.
 > >
 > >
 > >
 > >
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100793, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl]
 > > Dec 22 22:36:22 daleks kernel: &fence->scheduled at 210850212762 f 
 > > 86#116944: signaled from irq context
 > > Dec 22 22:36:22 daleks kernel: pid=100699, dev=0xe200, auth=1, 
 > > AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:amdgpu_ih_process] [drm:drm_ioctl] 
 > > pid=100793, dev=0xe200, auth=1, AMDGPU_CS
 > > Dec 22 22:36:22 daleks kernel: amdgpu_ih_process: rptr 864, wptr 880
 > > Dec 22 22:36:22 daleks kernel: [drm:gfx_v8_0_eop_irq] IH: CP EOP
 > > Dec 22 22:36:22 daleks kernel: &fence->finished at 210850251259 f 
 > > 411#116528: signaled from irq context
 > > Dec 22 22:36:22 daleks kernel: fence at 210850253222 f 0#233742: signaled 
 > > from irq context
 > > Dec 22 22:36:22 daleks kernel: [drm:amdgpu_ih_process] amdgpu_ih_process: 
 > > rptr 880, wptr 880
 > > Dec 22 22:36:22 daleks kernel: [drm:amdgpu_ih_process] created fence 
 > > 410#116529 411#116529 @210850271550
 > > Dec 22 22:36:22 daleks kernel: amdgpu_ih_process: rptr 880, wptr 896
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] [drm:gfx_v8_0_eop_irq] 
 > > pid=100793, dev=0xe200, auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: IH: CP EOP
 > > Dec 22 22:36:22 daleks kernel: &fence->finished at 210850308909 f 
 > > 87#116944: signaled from irq context
 > > Dec 22 22:36:22 daleks kernel: fence at 210850310670 f 0#233743: signaled 
 > > from irq context
 > > Dec 22 22:36:22 daleks kernel: [drm:amdgpu_ih_process] amdgpu_ih_process: 
 > > rptr 896, wptr 896
 > > Dec 22 22:36:22 daleks kernel: &fence->scheduled at 210850325151 f 
 > > 410#116529: signaled from irq context
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100699, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100699, dev=0xe200, 
 > > auth=1, AMDGPU_CS
 > > Dec 22 22:36:22 daleks kernel: created fence 86#116945 87#116945 
 > > @210850375328
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100793, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl]
 > > Dec 22 22:36:22 daleks kernel: &fence->scheduled at 210850389385 f 
 > > 86#116945: signaled from irq context
 > > Dec 22 22:36:22 daleks kernel: [drm:amdgpu_ih_process] pid=100699, 
 > > dev=0xe200, auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] amdgpu_ih_process: rptr 
 > > 896, wptr 912
 > > Dec 22 22:36:22 daleks kernel: [drm:gfx_v8_0_eop_irq] IH: CP EOP
 > > Dec 22 22:36:22 daleks kernel: &fence->finished at 210850416620 f 
 > > 411#116529: signaled from irq context
 > > Dec 22 22:36:22 daleks kernel: fence at 210850418382 f 0#233744: signaled 
 > > from irq context
 > > Dec 22 22:36:22 daleks kernel: pid=100793, dev=0xe200, auth=1, AMDGPU_CS
 > > Dec 22 22:36:22 daleks kernel: [drm:amdgpu_ih_process] created fence 
 > > 410#116530 411#116530 @210850440720
 > > Dec 22 22:36:22 daleks kernel: amdgpu_ih_process: rptr 912, wptr 912
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] [drm:amdgpu_ih_process] 
 > > amdgpu_ih_process: rptr 912, wptr 928
 > > Dec 22 22:36:22 daleks kernel: [drm:gfx_v8_0_eop_irq] IH: CP EOP
 > > Dec 22 22:36:22 daleks kernel: &fence->finished at 210850475397 f 
 > > 87#116945: signaled from irq context
 > > Dec 22 22:36:22 daleks kernel: fence at 210850477167 f 0#233745: signaled 
 > > from irq context
 > > Dec 22 22:36:22 daleks kernel: pid=100793, dev=0xe200, auth=1, 
 > > AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:amdgpu_ih_process] amdgpu_ih_process: 
 > > rptr 928, wptr 928
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100699, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100699, dev=0xe200, 
 > > auth=1, AMDGPU_CS
 > > Dec 22 22:36:22 daleks kernel: created fence 86#116946 87#116946 
 > > @210850557790
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100793, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100699, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100793, dev=0xe200, 
 > > auth=1, AMDGPU_CS
 > > Dec 22 22:36:22 daleks kernel: created fence 410#116531 411#116531 
 > > @210850614023
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100793, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100699, dev=0xe200, 
 > > auth=1, AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100699, dev=0xe200, 
 > > auth=1, AMDGPU_CS
 > > Dec 22 22:36:22 daleks kernel: created fence 86#116947 87#116947 
 > > @210850719230
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] pid=100793, dev=0xe200, 
 > > auth=1, AMDGPU_WAIT_CS
 > > Dec 22 22:36:22 daleks kernel: [drm:drm_ioctl] amdgpu_cs_wait on 411#116530
 > > Dec 22 22:36:22 daleks kernel: pid=100699, dev=0xe200, auth=1, 
 > > AMDGPU_BO_LIST
 > > Dec 22 22:36:22 daleks kernel: 411#116530 sleeping tid 100793 at 
 > > 210850747487
 > >
 > >
 > > -M
 > >
 > > _______________________________________________
 > > amd-gfx mailing list
 > > [email protected]
 > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
 > 
 > 
 > _______________________________________________
 > amd-gfx mailing list
 > [email protected]
 > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
 > 


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to