Re: [PATCH] drm/radeon: fix VM page table setup on SI

2012-07-17 Thread Michel Dänzer
On Fre, 2012-06-29 at 14:07 -0400, Jerome Glisse wrote:
 On Fri, Jun 29, 2012 at 12:14 PM, Michel Dänzer mic...@daenzer.net wrote:
  On Fre, 2012-06-29 at 11:28 -0400, Jerome Glisse wrote:
  On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com 
  wrote:
   On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net 
   wrote:
   On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote:
   From: Alex Deucher alexander.deuc...@amd.com
  
   Cayman and trinity allow for variable sized VM page
   tables, but SI requires that all page tables be the
   same size.  The current code assumes variablely sized
   VM page tables so SI may end up with part of each page
   table overlapping with other memory which could end
   up being interpreted by the VM hw as garbage.
  
   Change the code to better accomodate SI.  Allocate enough
   space for at least 2 full page tables and always set
   last_pfn to max_pfn on SI so each VM is backed by a full
   page table.  This limits us to only 2 VMs active at any
   given time on SI.  This will be rectified and the code can
   be reunified once we move to two level page tables.
  
   Signed-off-by: Alex Deucher alexander.deuc...@amd.com
  
   This change breaks the radeonsi driver for me. egltri_screen (the
   'golden' test for radeonsi at least basically working) locks up the
   GPU.
  
   I don't have any details about the lockup yet, as the GPU reset attempt
   hangs the machine. Any ideas offhand what radeonsi might be doing wrong?
  
   Maybe trying to access an unmapped page that happened to work by
   accident before and now causes a fault in the VM which halts the MC?
 
  Indeed, looks like it:
 
  radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000FF01B
  radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0202400C
 
  Oddly, while I have seen similar errors before (so at
  least some access to unmapped pages was caught even before your patch),
  I hadn't noticed them for a while with egltri_screen...
 
 
  Anyway, some more experimentation shows that it doesn't happen if I skip
  the clear, and it still happens when doing only a clear. I'll look into
  what might be wrong with the clears next week.
 
 
  Yeah only thing i can think of, can you get dump of various mc fault
  reg after lockup ?
 
  Did you have any particular registers in mind?
 
 I am guessing it's related to default page behavior, previously to
 this patch you would likely ended up writting/reading to the dummy
 page and thus not getting the segfault you deserved. With this patch
 you get the segfault you deserve ;)

Actually, the problem doesn't occur when applying the patch to current
drm-core-next. I'm guessing it was some kind of backend / tiling setup
issue that's been fixed in the meantime. Thanks for the help anyway,
guys. 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: fix VM page table setup on SI

2012-06-29 Thread Michel Dänzer
On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote: 
 From: Alex Deucher alexander.deuc...@amd.com
 
 Cayman and trinity allow for variable sized VM page
 tables, but SI requires that all page tables be the
 same size.  The current code assumes variablely sized
 VM page tables so SI may end up with part of each page
 table overlapping with other memory which could end
 up being interpreted by the VM hw as garbage.
 
 Change the code to better accomodate SI.  Allocate enough
 space for at least 2 full page tables and always set
 last_pfn to max_pfn on SI so each VM is backed by a full
 page table.  This limits us to only 2 VMs active at any
 given time on SI.  This will be rectified and the code can
 be reunified once we move to two level page tables.
 
 Signed-off-by: Alex Deucher alexander.deuc...@amd.com

This change breaks the radeonsi driver for me. egltri_screen (the
'golden' test for radeonsi at least basically working) locks up the
GPU. 

I don't have any details about the lockup yet, as the GPU reset attempt
hangs the machine. Any ideas offhand what radeonsi might be doing wrong?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: fix VM page table setup on SI

2012-06-29 Thread Alex Deucher
On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote:
 On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote:
 From: Alex Deucher alexander.deuc...@amd.com

 Cayman and trinity allow for variable sized VM page
 tables, but SI requires that all page tables be the
 same size.  The current code assumes variablely sized
 VM page tables so SI may end up with part of each page
 table overlapping with other memory which could end
 up being interpreted by the VM hw as garbage.

 Change the code to better accomodate SI.  Allocate enough
 space for at least 2 full page tables and always set
 last_pfn to max_pfn on SI so each VM is backed by a full
 page table.  This limits us to only 2 VMs active at any
 given time on SI.  This will be rectified and the code can
 be reunified once we move to two level page tables.

 Signed-off-by: Alex Deucher alexander.deuc...@amd.com

 This change breaks the radeonsi driver for me. egltri_screen (the
 'golden' test for radeonsi at least basically working) locks up the
 GPU.

 I don't have any details about the lockup yet, as the GPU reset attempt
 hangs the machine. Any ideas offhand what radeonsi might be doing wrong?

Maybe trying to access an unmapped page that happened to work by
accident before and now causes a fault in the VM which halts the MC?

Alex



 --
 Earthling Michel Dänzer           |                   http://www.amd.com
 Libre software enthusiast         |          Debian, X and DRI developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: fix VM page table setup on SI

2012-06-29 Thread Jerome Glisse
On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote:
 On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote:
 From: Alex Deucher alexander.deuc...@amd.com

 Cayman and trinity allow for variable sized VM page
 tables, but SI requires that all page tables be the
 same size.  The current code assumes variablely sized
 VM page tables so SI may end up with part of each page
 table overlapping with other memory which could end
 up being interpreted by the VM hw as garbage.

 Change the code to better accomodate SI.  Allocate enough
 space for at least 2 full page tables and always set
 last_pfn to max_pfn on SI so each VM is backed by a full
 page table.  This limits us to only 2 VMs active at any
 given time on SI.  This will be rectified and the code can
 be reunified once we move to two level page tables.

 Signed-off-by: Alex Deucher alexander.deuc...@amd.com

 This change breaks the radeonsi driver for me. egltri_screen (the
 'golden' test for radeonsi at least basically working) locks up the
 GPU.

 I don't have any details about the lockup yet, as the GPU reset attempt
 hangs the machine. Any ideas offhand what radeonsi might be doing wrong?

 Maybe trying to access an unmapped page that happened to work by
 accident before and now causes a fault in the VM which halts the MC?

 Alex


Yeah only thing i can think of, can you get dump of various mc fault
reg after lockup ?

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: fix VM page table setup on SI

2012-06-29 Thread Michel Dänzer
On Fre, 2012-06-29 at 11:28 -0400, Jerome Glisse wrote: 
 On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com wrote:
  On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote:
  On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote:
  From: Alex Deucher alexander.deuc...@amd.com
 
  Cayman and trinity allow for variable sized VM page
  tables, but SI requires that all page tables be the
  same size.  The current code assumes variablely sized
  VM page tables so SI may end up with part of each page
  table overlapping with other memory which could end
  up being interpreted by the VM hw as garbage.
 
  Change the code to better accomodate SI.  Allocate enough
  space for at least 2 full page tables and always set
  last_pfn to max_pfn on SI so each VM is backed by a full
  page table.  This limits us to only 2 VMs active at any
  given time on SI.  This will be rectified and the code can
  be reunified once we move to two level page tables.
 
  Signed-off-by: Alex Deucher alexander.deuc...@amd.com
 
  This change breaks the radeonsi driver for me. egltri_screen (the
  'golden' test for radeonsi at least basically working) locks up the
  GPU.
 
  I don't have any details about the lockup yet, as the GPU reset attempt
  hangs the machine. Any ideas offhand what radeonsi might be doing wrong?
 
  Maybe trying to access an unmapped page that happened to work by
  accident before and now causes a fault in the VM which halts the MC?

Indeed, looks like it:

radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000FF01B
radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0202400C

Oddly, while I have seen similar errors before (so at 
least some access to unmapped pages was caught even before your patch),
I hadn't noticed them for a while with egltri_screen...


Anyway, some more experimentation shows that it doesn't happen if I skip
the clear, and it still happens when doing only a clear. I'll look into
what might be wrong with the clears next week.


 Yeah only thing i can think of, can you get dump of various mc fault
 reg after lockup ?

Did you have any particular registers in mind?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: fix VM page table setup on SI

2012-06-29 Thread Jerome Glisse
On Fri, Jun 29, 2012 at 12:14 PM, Michel Dänzer mic...@daenzer.net wrote:
 On Fre, 2012-06-29 at 11:28 -0400, Jerome Glisse wrote:
 On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher alexdeuc...@gmail.com wrote:
  On Fri, Jun 29, 2012 at 10:49 AM, Michel Dänzer mic...@daenzer.net wrote:
  On Don, 2012-06-28 at 17:53 -0400, alexdeuc...@gmail.com wrote:
  From: Alex Deucher alexander.deuc...@amd.com
 
  Cayman and trinity allow for variable sized VM page
  tables, but SI requires that all page tables be the
  same size.  The current code assumes variablely sized
  VM page tables so SI may end up with part of each page
  table overlapping with other memory which could end
  up being interpreted by the VM hw as garbage.
 
  Change the code to better accomodate SI.  Allocate enough
  space for at least 2 full page tables and always set
  last_pfn to max_pfn on SI so each VM is backed by a full
  page table.  This limits us to only 2 VMs active at any
  given time on SI.  This will be rectified and the code can
  be reunified once we move to two level page tables.
 
  Signed-off-by: Alex Deucher alexander.deuc...@amd.com
 
  This change breaks the radeonsi driver for me. egltri_screen (the
  'golden' test for radeonsi at least basically working) locks up the
  GPU.
 
  I don't have any details about the lockup yet, as the GPU reset attempt
  hangs the machine. Any ideas offhand what radeonsi might be doing wrong?
 
  Maybe trying to access an unmapped page that happened to work by
  accident before and now causes a fault in the VM which halts the MC?

 Indeed, looks like it:

 radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000FF01B
 radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0202400C

 Oddly, while I have seen similar errors before (so at
 least some access to unmapped pages was caught even before your patch),
 I hadn't noticed them for a while with egltri_screen...


 Anyway, some more experimentation shows that it doesn't happen if I skip
 the clear, and it still happens when doing only a clear. I'll look into
 what might be wrong with the clears next week.


 Yeah only thing i can think of, can you get dump of various mc fault
 reg after lockup ?

 Did you have any particular registers in mind?


I am guessing it's related to default page behavior, previously to
this patch you would likely ended up writting/reading to the dummy
page and thus not getting the segfault you deserved. With this patch
you get the segfault you deserve ;)

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: fix VM page table setup on SI

2012-06-28 Thread Jerome Glisse
On Thu, Jun 28, 2012 at 5:53 PM,  alexdeuc...@gmail.com wrote:
 From: Alex Deucher alexander.deuc...@amd.com

 Cayman and trinity allow for variable sized VM page
 tables, but SI requires that all page tables be the
 same size.  The current code assumes variablely sized
 VM page tables so SI may end up with part of each page
 table overlapping with other memory which could end
 up being interpreted by the VM hw as garbage.

 Change the code to better accomodate SI.  Allocate enough
 space for at least 2 full page tables and always set
 last_pfn to max_pfn on SI so each VM is backed by a full
 page table.  This limits us to only 2 VMs active at any
 given time on SI.  This will be rectified and the code can
 be reunified once we move to two level page tables.

 Signed-off-by: Alex Deucher alexander.deuc...@amd.com
Reviewed-by: Jerome Glisse jgli...@redhat.com
 Cc: sta...@vger.kernel.org
 ---
  drivers/gpu/drm/radeon/radeon_gart.c |   13 +++--
  drivers/gpu/drm/radeon/si.c          |    4 ++--
  2 files changed, 13 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
 b/drivers/gpu/drm/radeon/radeon_gart.c
 index 2b34c1a..8c44d1d 100644
 --- a/drivers/gpu/drm/radeon/radeon_gart.c
 +++ b/drivers/gpu/drm/radeon/radeon_gart.c
 @@ -289,8 +289,9 @@ int radeon_vm_manager_init(struct radeon_device *rdev)
        rdev-vm_manager.enabled = false;

        /* mark first vm as always in use, it's the system one */
 +       /* allocate enough for 2 full VM pts */
        r = radeon_sa_bo_manager_init(rdev, rdev-vm_manager.sa_manager,
 -                                     rdev-vm_manager.max_pfn * 8,
 +                                     rdev-vm_manager.max_pfn * 8 * 2,
                                      RADEON_GEM_DOMAIN_VRAM);
        if (r) {
                dev_err(rdev-dev, failed to allocate vm bo (%dKB)\n,
 @@ -633,7 +634,15 @@ int radeon_vm_init(struct radeon_device *rdev, struct 
 radeon_vm *vm)
        mutex_init(vm-mutex);
        INIT_LIST_HEAD(vm-list);
        INIT_LIST_HEAD(vm-va);
 -       vm-last_pfn = 0;
 +       /* SI requires equal sized PTs for all VMs, so always set
 +        * last_pfn to max_pfn.  cayman allows variable sized
 +        * pts so we can grow then as needed.  Once we switch
 +        * to two level pts we can unify this again.
 +        */
 +       if (rdev-family = CHIP_TAHITI)
 +               vm-last_pfn = rdev-vm_manager.max_pfn;
 +       else
 +               vm-last_pfn = 0;
        /* map the ib pool buffer at 0 in virtual address space, set
         * read only
         */
 diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
 index 34603b3c8..127600e 100644
 --- a/drivers/gpu/drm/radeon/si.c
 +++ b/drivers/gpu/drm/radeon/si.c
 @@ -2365,12 +2365,12 @@ int si_pcie_gart_enable(struct radeon_device *rdev)
        WREG32(0x15DC, 0);

        /* empty context1-15 */
 -       /* FIXME start with 1G, once using 2 level pt switch to full
 +       /* FIXME start with 4G, once using 2 level pt switch to full
         * vm size space
         */
        /* set vm size, must be a multiple of 4 */
        WREG32(VM_CONTEXT1_PAGE_TABLE_START_ADDR, 0);
 -       WREG32(VM_CONTEXT1_PAGE_TABLE_END_ADDR, (1  30) / 
 RADEON_GPU_PAGE_SIZE);
 +       WREG32(VM_CONTEXT1_PAGE_TABLE_END_ADDR, rdev-vm_manager.max_pfn);
        for (i = 1; i  16; i++) {
                if (i  8)
                        WREG32(VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (i  2),
 --
 1.7.7.5

 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel