[Public] Hi,
After investigating quite some time on this issue, found freeze problem is not with the amdgpu part of buddy allocator patch as the patch doesn’t throw any issues when applied separately on top of the stable base of drm-next. After digging more into this issue, the below patch seems to be the cause of this problem, drm/ttm: rework bulk move handling v5 https://cgit.freedesktop.org/drm/drm/commit/?id=fee2ede155423b0f7a559050a39750b98fe9db69 when this patch applied on top of the stable (working version) of drm-next without buddy allocator patch, we can see multiple issues listed below, each thrown randomly at every GravityMark run, 1. general protection fault at ttm_lru_bulk_move_tail() 2. NULL pointer deference at ttm_lru_bulk_move_tail() 3. NULL pointer deference at ttm_resource_init(). Regards, Arun. -----Original Message----- From: Alex Deucher <[email protected]> Sent: Monday, May 16, 2022 8:36 PM To: Mike Lothian <[email protected]> Cc: Paneer Selvam, Arunpravin <[email protected]>; Intel Graphics Development <[email protected]>; amd-gfx list <[email protected]>; Maling list - DRI developers <[email protected]>; Deucher, Alexander <[email protected]>; Koenig, Christian <[email protected]>; Matthew Auld <[email protected]> Subject: Re: [PATCH v12] drm/amdgpu: add drm buddy support to amdgpu On Mon, May 16, 2022 at 8:40 AM Mike Lothian <[email protected]> wrote: > > Hi > > The merge window for 5.19 will probably be opening next week, has > there been any progress with this bug? It took a while to find a combination of GPUs that would repro the issue, but now that we can, it is still being investigated. Alex > > Thanks > > Mike > > On Mon, 2 May 2022 at 17:31, Mike Lothian <[email protected]> wrote: > > > > On Mon, 2 May 2022 at 16:54, Arunpravin Paneer Selvam > > <[email protected]> wrote: > > > > > > > > > > > > On 5/2/2022 8:41 PM, Mike Lothian wrote: > > > > On Wed, 27 Apr 2022 at 12:55, Mike Lothian <[email protected]> wrote: > > > >> On Tue, 26 Apr 2022 at 17:36, Christian König > > > >> <[email protected]> wrote: > > > >>> Hi Mike, > > > >>> > > > >>> sounds like somehow stitching together the SG table for PRIME > > > >>> doesn't work any more with this patch. > > > >>> > > > >>> Can you try with P2P DMA disabled? > > > >> -CONFIG_PCI_P2PDMA=y > > > >> +# CONFIG_PCI_P2PDMA is not set > > > >> > > > >> If that's what you're meaning, then there's no difference, I'll > > > >> upload my dmesg to the gitlab issue > > > >> > > > >>> Apart from that can you take a look Arun? > > > >>> > > > >>> Thanks, > > > >>> Christian. > > > > Hi > > > > > > > > Have you had any success in replicating this? > > > Hi Mike, > > > I couldn't replicate on my Raven APU machine. I see you have 2 > > > cards initialized, one is Renoir and the other is Navy Flounder. > > > Could you give some more details, are you running Gravity Mark on > > > Renoir and what is your system RAM configuration? > > > > > > > > Cheers > > > > > > > > Mike > > > > > Hi > > > > It's a PRIME laptop, it failed on the RENOIR too, it caused a > > lockup, but systemd managed to capture it, I'll attach it to the > > issue > > > > I've got 64GB RAM, the 6800M has 12GB VRAM > > > > Cheers > > > > Mike
