I've sent a patch to the list that fixes the bug on my end.

Cheers,

Tom

On 2018-09-17 2:01 p.m., Tom St Denis wrote:
On 2018-09-17 1:55 p.m., Christian König wrote:
Am 17.09.2018 um 19:50 schrieb Tom St Denis:
On 2018-09-17 1:45 p.m., Christian König wrote:
Mhm, not the slightest idea.

That nearly looks like adev->stolen_vga_memory already contains something.

Nope,

[   51.564605] >>>adev->stolen_vga_memory == (null)
[   51.564619] kasan: CONFIG_KASAN_INLINE enabled
[   51.564877] kasan: GPF could be caused by NULL-ptr deref or user memory access [   51.565071] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [   51.565254] CPU: 6 PID: 3863 Comm: modprobe Not tainted 4.19.0-rc1+ #30 [   51.565425] Hardware name: System manufacturer System Product Name/TUF B350M-PLUS GAMING, BIOS 4011 04/19/2018
[   51.565714] RIP: 0010:amdgpu_bo_create_kernel+0x59/0x1a0 [amdgpu]

That's me printing out the value of the value for stolen_vga_memory before the call to allocate it.

What does amdgpu_bo_create_kernel+0x59 points to?

I've never really got line numbers to work with the kernel but if I had to guess I'd say right here

int amdgpu_bo_create_kernel(struct amdgpu_device *adev,
                 unsigned long size, int align,
                 u32 domain, struct amdgpu_bo **bo_ptr,
                 u64 *gpu_addr, void **cpu_addr)
{
     int r;

     r = amdgpu_bo_create_reserved(adev, size, align, domain, bo_ptr,
                       gpu_addr, cpu_addr);

     if (r)
         return r;

*bo_ptr is NULL ===>    amdgpu_bo_unreserve(*bo_ptr);

     return 0;
}

Which then results in

static inline void amdgpu_bo_unreserve(struct amdgpu_bo *bo)
{
     ttm_bo_unreserve(&bo->tbo);
}

Which then passes the address NULL + offsetof(tbo) to ttm_bo_unreserve:

static inline void ttm_bo_unreserve(struct ttm_buffer_object *bo)
{
         if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
                 spin_lock(&bo->bdev->glob->lru_lock);
                 ttm_bo_add_to_lru(bo);
                 spin_unlock(&bo->bdev->glob->lru_lock);
         }
         reservation_object_unlock(bo->resv);
}


Which likely faults on reading bo->mem.placement since the address is bogus.

The report is from amdgpu_bo_create_kernel because everything is a macro or inlined... :-)

Tom


Christian.


Tom



Christian.

Am 17.09.2018 um 18:47 schrieb Tom St Denis:
On 2018-09-17 12:21 p.m., Tom St Denis wrote:
(attached).  I'll try to bisect in a second.  Is anyone aware of this?

Tom

Bisection led to:

a327772a5655ff4fb104c8aae6515faa461df466 is the first bad commit
commit a327772a5655ff4fb104c8aae6515faa461df466
Author: Christian König <christian.koe...@amd.com>
Date:   Fri Sep 14 21:06:50 2018 +0200

    drm/amdgpu: drop size check

    We no don't allocate zero sized kernel BOs any longer.

    Signed-off-by: Christian König <christian.koe...@amd.com>
    Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>

:040000 040000 265e4fa231d367d354e4c66600b8f98a4d2f04c4 3702baaeb2423361dcd7eac8c533edace760ae3e M      drivers


As the culprit.

Cheers,
Tom


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to