Re: [PATCH] drm/radeon: fix VM page table setup on SI
On Fre, 2012-06-29 at 14:07 -0400, Jerome Glisse wrote: On Fri, Jun 29, 2012 at 12:14 PM, Michel Dänzer mic...@daenzer.net wrote: On Fre, 2012-06-29 at 11:28 -0400, Jerome Glisse wrote: On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Cayman and trinity allow for variable sized VM page tables, but SI requires that all page tables be the same size. The current code assumes variablely sized VM page tables so SI may end up with part of each page table overlapping with other memory which could end up being interpreted by the VM hw as garbage. Change the code to better accomodate SI. Allocate enough space for at least 2 full page tables and always set last_pfn to max_pfn on SI so each VM is backed by a full page table. This limits us to only 2 VMs active at any given time on SI. This will be rectified and the code can be reunified once we move to two level page tables. Signed-off-by: Alex Deucher alexander.deuc...@amd.com This change breaks the radeonsi driver for me. egltri_screen (the 'golden' test for radeonsi at least basically working) locks up the GPU. I don't have any details about the lockup yet, as the GPU reset attempt hangs the machine. Any ideas offhand what radeonsi might be doing wrong? Maybe trying to access an unmapped page that happened to work by accident before and now causes a fault in the VM which halts the MC? Indeed, looks like it: radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000FF01B radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0202400C Oddly, while I have seen similar errors before (so at least some access to unmapped pages was caught even before your patch), I hadn't noticed them for a while with egltri_screen... Anyway, some more experimentation shows that it doesn't happen if I skip the clear, and it still happens when doing only a clear. I'll look into what might be wrong with the clears next week. Yeah only thing i can think of, can you get dump of various mc fault reg after lockup ? Did you have any particular registers in mind? I am guessing it's related to default page behavior, previously to this patch you would likely ended up writting/reading to the dummy page and thus not getting the segfault you deserved. With this patch you get the segfault you deserve ;) Actually, the problem doesn't occur when applying the patch to current drm-core-next. I'm guessing it was some kind of backend / tiling setup issue that's been fixed in the meantime. Thanks for the help anyway, guys. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix VM page table setup on SI
On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Cayman and trinity allow for variable sized VM page tables, but SI requires that all page tables be the same size. The current code assumes variablely sized VM page tables so SI may end up with part of each page table overlapping with other memory which could end up being interpreted by the VM hw as garbage. Change the code to better accomodate SI. Allocate enough space for at least 2 full page tables and always set last_pfn to max_pfn on SI so each VM is backed by a full page table. This limits us to only 2 VMs active at any given time on SI. This will be rectified and the code can be reunified once we move to two level page tables. Signed-off-by: Alex Deucher alexander.deuc...@amd.com This change breaks the radeonsi driver for me. egltri_screen (the 'golden' test for radeonsi at least basically working) locks up the GPU. I don't have any details about the lockup yet, as the GPU reset attempt hangs the machine. Any ideas offhand what radeonsi might be doing wrong? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix VM page table setup on SI
On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Cayman and trinity allow for variable sized VM page tables, but SI requires that all page tables be the same size. The current code assumes variablely sized VM page tables so SI may end up with part of each page table overlapping with other memory which could end up being interpreted by the VM hw as garbage. Change the code to better accomodate SI. Allocate enough space for at least 2 full page tables and always set last_pfn to max_pfn on SI so each VM is backed by a full page table. This limits us to only 2 VMs active at any given time on SI. This will be rectified and the code can be reunified once we move to two level page tables. Signed-off-by: Alex Deucher alexander.deuc...@amd.com This change breaks the radeonsi driver for me. egltri_screen (the 'golden' test for radeonsi at least basically working) locks up the GPU. I don't have any details about the lockup yet, as the GPU reset attempt hangs the machine. Any ideas offhand what radeonsi might be doing wrong? Maybe trying to access an unmapped page that happened to work by accident before and now causes a fault in the VM which halts the MC? Alex -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix VM page table setup on SI
On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Cayman and trinity allow for variable sized VM page tables, but SI requires that all page tables be the same size. The current code assumes variablely sized VM page tables so SI may end up with part of each page table overlapping with other memory which could end up being interpreted by the VM hw as garbage. Change the code to better accomodate SI. Allocate enough space for at least 2 full page tables and always set last_pfn to max_pfn on SI so each VM is backed by a full page table. This limits us to only 2 VMs active at any given time on SI. This will be rectified and the code can be reunified once we move to two level page tables. Signed-off-by: Alex Deucher alexander.deuc...@amd.com This change breaks the radeonsi driver for me. egltri_screen (the 'golden' test for radeonsi at least basically working) locks up the GPU. I don't have any details about the lockup yet, as the GPU reset attempt hangs the machine. Any ideas offhand what radeonsi might be doing wrong? Maybe trying to access an unmapped page that happened to work by accident before and now causes a fault in the VM which halts the MC? Alex Yeah only thing i can think of, can you get dump of various mc fault reg after lockup ? Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix VM page table setup on SI
On Fre, 2012-06-29 at 11:28 -0400, Jerome Glisse wrote: On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Cayman and trinity allow for variable sized VM page tables, but SI requires that all page tables be the same size. The current code assumes variablely sized VM page tables so SI may end up with part of each page table overlapping with other memory which could end up being interpreted by the VM hw as garbage. Change the code to better accomodate SI. Allocate enough space for at least 2 full page tables and always set last_pfn to max_pfn on SI so each VM is backed by a full page table. This limits us to only 2 VMs active at any given time on SI. This will be rectified and the code can be reunified once we move to two level page tables. Signed-off-by: Alex Deucher alexander.deuc...@amd.com This change breaks the radeonsi driver for me. egltri_screen (the 'golden' test for radeonsi at least basically working) locks up the GPU. I don't have any details about the lockup yet, as the GPU reset attempt hangs the machine. Any ideas offhand what radeonsi might be doing wrong? Maybe trying to access an unmapped page that happened to work by accident before and now causes a fault in the VM which halts the MC? Indeed, looks like it: radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000FF01B radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0202400C Oddly, while I have seen similar errors before (so at least some access to unmapped pages was caught even before your patch), I hadn't noticed them for a while with egltri_screen... Anyway, some more experimentation shows that it doesn't happen if I skip the clear, and it still happens when doing only a clear. I'll look into what might be wrong with the clears next week. Yeah only thing i can think of, can you get dump of various mc fault reg after lockup ? Did you have any particular registers in mind? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix VM page table setup on SI
On Fri, Jun 29, 2012 at 12:14 PM, Michel Dänzer mic...@daenzer.net wrote: On Fre, 2012-06-29 at 11:28 -0400, Jerome Glisse wrote: On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Cayman and trinity allow for variable sized VM page tables, but SI requires that all page tables be the same size. The current code assumes variablely sized VM page tables so SI may end up with part of each page table overlapping with other memory which could end up being interpreted by the VM hw as garbage. Change the code to better accomodate SI. Allocate enough space for at least 2 full page tables and always set last_pfn to max_pfn on SI so each VM is backed by a full page table. This limits us to only 2 VMs active at any given time on SI. This will be rectified and the code can be reunified once we move to two level page tables. Signed-off-by: Alex Deucher alexander.deuc...@amd.com This change breaks the radeonsi driver for me. egltri_screen (the 'golden' test for radeonsi at least basically working) locks up the GPU. I don't have any details about the lockup yet, as the GPU reset attempt hangs the machine. Any ideas offhand what radeonsi might be doing wrong? Maybe trying to access an unmapped page that happened to work by accident before and now causes a fault in the VM which halts the MC? Indeed, looks like it: radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000FF01B radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0202400C Oddly, while I have seen similar errors before (so at least some access to unmapped pages was caught even before your patch), I hadn't noticed them for a while with egltri_screen... Anyway, some more experimentation shows that it doesn't happen if I skip the clear, and it still happens when doing only a clear. I'll look into what might be wrong with the clears next week. Yeah only thing i can think of, can you get dump of various mc fault reg after lockup ? Did you have any particular registers in mind? I am guessing it's related to default page behavior, previously to this patch you would likely ended up writting/reading to the dummy page and thus not getting the segfault you deserved. With this patch you get the segfault you deserve ;) Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix VM page table setup on SI
On Thu, Jun 28, 2012 at 5:53 PM, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Cayman and trinity allow for variable sized VM page tables, but SI requires that all page tables be the same size. The current code assumes variablely sized VM page tables so SI may end up with part of each page table overlapping with other memory which could end up being interpreted by the VM hw as garbage. Change the code to better accomodate SI. Allocate enough space for at least 2 full page tables and always set last_pfn to max_pfn on SI so each VM is backed by a full page table. This limits us to only 2 VMs active at any given time on SI. This will be rectified and the code can be reunified once we move to two level page tables. Signed-off-by: Alex Deucher alexander.deuc...@amd.com Reviewed-by: Jerome Glisse jgli...@redhat.com Cc: sta...@vger.kernel.org --- drivers/gpu/drm/radeon/radeon_gart.c | 13 +++-- drivers/gpu/drm/radeon/si.c | 4 ++-- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 2b34c1a..8c44d1d 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -289,8 +289,9 @@ int radeon_vm_manager_init(struct radeon_device *rdev) rdev-vm_manager.enabled = false; /* mark first vm as always in use, it's the system one */ + /* allocate enough for 2 full VM pts */ r = radeon_sa_bo_manager_init(rdev, rdev-vm_manager.sa_manager, - rdev-vm_manager.max_pfn * 8, + rdev-vm_manager.max_pfn * 8 * 2, RADEON_GEM_DOMAIN_VRAM); if (r) { dev_err(rdev-dev, failed to allocate vm bo (%dKB)\n, @@ -633,7 +634,15 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) mutex_init(vm-mutex); INIT_LIST_HEAD(vm-list); INIT_LIST_HEAD(vm-va); - vm-last_pfn = 0; + /* SI requires equal sized PTs for all VMs, so always set + * last_pfn to max_pfn. cayman allows variable sized + * pts so we can grow then as needed. Once we switch + * to two level pts we can unify this again. + */ + if (rdev-family = CHIP_TAHITI) + vm-last_pfn = rdev-vm_manager.max_pfn; + else + vm-last_pfn = 0; /* map the ib pool buffer at 0 in virtual address space, set * read only */ diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c index 34603b3c8..127600e 100644 --- a/drivers/gpu/drm/radeon/si.c +++ b/drivers/gpu/drm/radeon/si.c @@ -2365,12 +2365,12 @@ int si_pcie_gart_enable(struct radeon_device *rdev) WREG32(0x15DC, 0); /* empty context1-15 */ - /* FIXME start with 1G, once using 2 level pt switch to full + /* FIXME start with 4G, once using 2 level pt switch to full * vm size space */ /* set vm size, must be a multiple of 4 */ WREG32(VM_CONTEXT1_PAGE_TABLE_START_ADDR, 0); - WREG32(VM_CONTEXT1_PAGE_TABLE_END_ADDR, (1 30) / RADEON_GPU_PAGE_SIZE); + WREG32(VM_CONTEXT1_PAGE_TABLE_END_ADDR, rdev-vm_manager.max_pfn); for (i = 1; i 16; i++) { if (i 8) WREG32(VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (i 2), -- 1.7.7.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel