Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-12 Thread Alex Smith
On Tue, 11 Jun 2019 at 14:32, Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 10.06.19 um 15:56 schrieb Bas Nieuwenhuizen:
> > On Sat, Jun 8, 2019 at 3:36 PM Alex Smith 
> wrote:
> >> On Mon, 3 Jun 2019 at 13:27, Koenig, Christian <
> christian.koe...@amd.com> wrote:
> >>> Am 03.06.19 um 14:21 schrieb Alex Smith:
> >>>
> >>> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian <
> christian.koe...@amd.com> wrote:
>  Am 02.06.19 um 12:32 schrieb Alex Smith:
> > Put the uncached GTT type at a higher index than the visible VRAM
> type,
> > rather than having GTT first.
> >
> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
> > type, and the property flags for GTT and visible VRAM are identical.
> > According to the spec, for types with identical flags, we should give
> > the one with better performance a lower index.
> >
> > Previously, apps which follow the spec guidance for choosing a memory
> > type would have picked the GTT type in preference to visible VRAM
> (all
> > Feral games will do this), and end up with lower performance.
> >
> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple
> of
> > other (Feral) games and saw similar improvement on those as well.
>  Well that patch doesn't looks like a good idea to me.
> 
>  Using VRAM over uncached GTT should have something between no and only
>  minimal performance difference on APU.
> 
>  To make things even worse VRAM is still needed for scanout and newer
>  laptops have only a very very low default setting (32 or 16MB). So you
>  can end up in VRAM clashing on those systems.
> 
>  Can you check some kernel statistics to figure out what exactly is
> going
>  on here?
> >>>
> >>> What statistics should I look at?
> >>>
> >>>
> >>> First of all take a look at amdgpu_gem_info file in the debugfs
> directory while using GTT and match that to using VRAM. You should see a
> lot more of GTT ... CPU_GTT_USWC entries with the GTT variant. If the
> CPU_GTT_USWC flag is missing we have found the problem.
> >>>
> >>> If that looks ok, then take a look at the ttm_page_pool or
> ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got.
> Huge pages should be used for anything larger than 2MB, if not we have
> found the problem.
> >>>
> >>> If that still isn't the issue I need to take a look at the VM code
> again and see if we still map VRAM/GTT differently on APUs.
> >>
> >> OK, got around to looking at this. amdgpu_gem_info does have more USWC
> entries when using GTT. I've attached the output from VRAM vs GTT in case
> you can spot anything else in there.
> >>
> >> ttm_page_pool has 9806 wc, 238 wc huge, no uc or uc huge.
> > To add to this, I tried rounding up the size all application GTT
> > allocations to a multiple of 2 megabytes (+ a suballocator for buffers
> > < 2M). This increased performance a bit but not nearly what going from
> > GTT->"VRAM" brings.
>
> I need to dig deeper when I have a bit more time.
>
> The logs Alex provided didn't showed anything obviously wrong, so no
> idea what's the actual problem here.
>
> Anyway feel free to go ahead with this approach, but please keep in mind
> that this might cause problems on some systems.
>

Thanks Christian.

Bas, Samuel - what do you think to going ahead with this change in the
meantime? Perhaps we could add some threshold on minimum "VRAM" size
(256MB?) required to change the order, to avoid issues on systems where
that heap is really small?

Alex


>
> Christian.
>
> >
> >> FWIW this was from kernel 5.0.10, I just upgraded to 5.1.6 and still
> the same perf difference there.
> >>
> >> Thanks,
> >> Alex
> >>
> >>>
> >>> Thanks,
> >>> Christian.
> >>>
> >>>
> >>> Thanks,
> >>> Alex
> >>>
> 
>  Regards,
>  Christian.
> 
> > Signed-off-by: Alex Smith 
> > ---
> > I noticed that the memory types advertised on my Raven laptop looked
> a
> > bit odd so played around with it and found this. I'm not sure if it
> is
> > actually expected that the performance difference between visible
> VRAM
> > and GTT is so large, seeing as it's not dedicated VRAM, but the
> results
> > are clear (and consistent, tested multiple times).
> > ---
> >src/amd/vulkan/radv_device.c | 18 +++---
> >1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_device.c
> b/src/amd/vulkan/radv_device.c
> > index 3cf050ed220..d36ee226ebd 100644
> > --- a/src/amd/vulkan/radv_device.c
> > +++ b/src/amd/vulkan/radv_device.c
> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> >.heapIndex = vram_index,
> >};
> >}
> > - if (gart_index >= 0) {
> > +   

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-11 Thread Christian König

Am 10.06.19 um 15:56 schrieb Bas Nieuwenhuizen:

On Sat, Jun 8, 2019 at 3:36 PM Alex Smith  wrote:

On Mon, 3 Jun 2019 at 13:27, Koenig, Christian  wrote:

Am 03.06.19 um 14:21 schrieb Alex Smith:

On Mon, 3 Jun 2019 at 11:57, Koenig, Christian  wrote:

Am 02.06.19 um 12:32 schrieb Alex Smith:

Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.

When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.

Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.

On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.

Well that patch doesn't looks like a good idea to me.

Using VRAM over uncached GTT should have something between no and only
minimal performance difference on APU.

To make things even worse VRAM is still needed for scanout and newer
laptops have only a very very low default setting (32 or 16MB). So you
can end up in VRAM clashing on those systems.

Can you check some kernel statistics to figure out what exactly is going
on here?


What statistics should I look at?


First of all take a look at amdgpu_gem_info file in the debugfs directory while 
using GTT and match that to using VRAM. You should see a lot more of GTT ... 
CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag is missing 
we have found the problem.

If that looks ok, then take a look at the ttm_page_pool or ttm_dma_page_pool 
file and see how many wc/uc and wc/uc huge pages you got. Huge pages should be 
used for anything larger than 2MB, if not we have found the problem.

If that still isn't the issue I need to take a look at the VM code again and 
see if we still map VRAM/GTT differently on APUs.


OK, got around to looking at this. amdgpu_gem_info does have more USWC entries 
when using GTT. I've attached the output from VRAM vs GTT in case you can spot 
anything else in there.

ttm_page_pool has 9806 wc, 238 wc huge, no uc or uc huge.

To add to this, I tried rounding up the size all application GTT
allocations to a multiple of 2 megabytes (+ a suballocator for buffers
< 2M). This increased performance a bit but not nearly what going from
GTT->"VRAM" brings.


I need to dig deeper when I have a bit more time.

The logs Alex provided didn't showed anything obviously wrong, so no 
idea what's the actual problem here.


Anyway feel free to go ahead with this approach, but please keep in mind 
that this might cause problems on some systems.


Christian.




FWIW this was from kernel 5.0.10, I just upgraded to 5.1.6 and still the same 
perf difference there.

Thanks,
Alex



Thanks,
Christian.


Thanks,
Alex



Regards,
Christian.


Signed-off-by: Alex Smith 
---
I noticed that the memory types advertised on my Raven laptop looked a
bit odd so played around with it and found this. I'm not sure if it is
actually expected that the performance difference between visible VRAM
and GTT is so large, seeing as it's not dedicated VRAM, but the results
are clear (and consistent, tested multiple times).
---
   src/amd/vulkan/radv_device.c | 18 +++---
   1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 3cf050ed220..d36ee226ebd 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
   .heapIndex = vram_index,
   };
   }
- if (gart_index >= 0) {
+ if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
   device->mem_type_indices[type_count] = 
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
   device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
   .propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
- VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
- (device->rad_info.has_dedicated_vram ? 0 : 
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
+ VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
   .heapIndex = gart_index,
   };
   }
@@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
   .heapIndex = visible_vram_index,
   };
   }
+ if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
+ /* Put GTT after visible VRAM for GPUs without dedicated VRAM
+  * as they have identical property flags, and according to the

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-10 Thread Bas Nieuwenhuizen
On Sat, Jun 8, 2019 at 3:36 PM Alex Smith  wrote:
>
> On Mon, 3 Jun 2019 at 13:27, Koenig, Christian  
> wrote:
>>
>> Am 03.06.19 um 14:21 schrieb Alex Smith:
>>
>> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian  
>> wrote:
>>>
>>> Am 02.06.19 um 12:32 schrieb Alex Smith:
>>> > Put the uncached GTT type at a higher index than the visible VRAM type,
>>> > rather than having GTT first.
>>> >
>>> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
>>> > type, and the property flags for GTT and visible VRAM are identical.
>>> > According to the spec, for types with identical flags, we should give
>>> > the one with better performance a lower index.
>>> >
>>> > Previously, apps which follow the spec guidance for choosing a memory
>>> > type would have picked the GTT type in preference to visible VRAM (all
>>> > Feral games will do this), and end up with lower performance.
>>> >
>>> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
>>> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
>>> > other (Feral) games and saw similar improvement on those as well.
>>>
>>> Well that patch doesn't looks like a good idea to me.
>>>
>>> Using VRAM over uncached GTT should have something between no and only
>>> minimal performance difference on APU.
>>>
>>> To make things even worse VRAM is still needed for scanout and newer
>>> laptops have only a very very low default setting (32 or 16MB). So you
>>> can end up in VRAM clashing on those systems.
>>>
>>> Can you check some kernel statistics to figure out what exactly is going
>>> on here?
>>
>>
>> What statistics should I look at?
>>
>>
>> First of all take a look at amdgpu_gem_info file in the debugfs directory 
>> while using GTT and match that to using VRAM. You should see a lot more of 
>> GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag 
>> is missing we have found the problem.
>>
>> If that looks ok, then take a look at the ttm_page_pool or ttm_dma_page_pool 
>> file and see how many wc/uc and wc/uc huge pages you got. Huge pages should 
>> be used for anything larger than 2MB, if not we have found the problem.
>>
>> If that still isn't the issue I need to take a look at the VM code again and 
>> see if we still map VRAM/GTT differently on APUs.
>
>
> OK, got around to looking at this. amdgpu_gem_info does have more USWC 
> entries when using GTT. I've attached the output from VRAM vs GTT in case you 
> can spot anything else in there.
>
> ttm_page_pool has 9806 wc, 238 wc huge, no uc or uc huge.

To add to this, I tried rounding up the size all application GTT
allocations to a multiple of 2 megabytes (+ a suballocator for buffers
< 2M). This increased performance a bit but not nearly what going from
GTT->"VRAM" brings.

>
> FWIW this was from kernel 5.0.10, I just upgraded to 5.1.6 and still the same 
> perf difference there.
>
> Thanks,
> Alex
>
>>
>>
>> Thanks,
>> Christian.
>>
>>
>> Thanks,
>> Alex
>>
>>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> >
>>> > Signed-off-by: Alex Smith 
>>> > ---
>>> > I noticed that the memory types advertised on my Raven laptop looked a
>>> > bit odd so played around with it and found this. I'm not sure if it is
>>> > actually expected that the performance difference between visible VRAM
>>> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
>>> > are clear (and consistent, tested multiple times).
>>> > ---
>>> >   src/amd/vulkan/radv_device.c | 18 +++---
>>> >   1 file changed, 15 insertions(+), 3 deletions(-)
>>> >
>>> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>>> > index 3cf050ed220..d36ee226ebd 100644
>>> > --- a/src/amd/vulkan/radv_device.c
>>> > +++ b/src/amd/vulkan/radv_device.c
>>> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
>>> > radv_physical_device *device)
>>> >   .heapIndex = vram_index,
>>> >   };
>>> >   }
>>> > - if (gart_index >= 0) {
>>> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>>> >   device->mem_type_indices[type_count] = 
>>> > RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>>> >   device->memory_properties.memoryTypes[type_count++] = 
>>> > (VkMemoryType) {
>>> >   .propertyFlags = 
>>> > VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>>> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
>>> > - (device->rad_info.has_dedicated_vram ? 0 : 
>>> > VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
>>> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>>> >   .heapIndex = gart_index,
>>> >   };
>>> >   }
>>> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
>>> > radv_physical_device *device)
>>> >   .heapIndex = visible_vram_index,
>>> >   };
>>> >   }
>>> > + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
>>> > 

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-08 Thread Alex Smith
On Mon, 3 Jun 2019 at 13:27, Koenig, Christian 
wrote:

> Am 03.06.19 um 14:21 schrieb Alex Smith:
>
> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian 
> wrote:
>
>> Am 02.06.19 um 12:32 schrieb Alex Smith:
>> > Put the uncached GTT type at a higher index than the visible VRAM type,
>> > rather than having GTT first.
>> >
>> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
>> > type, and the property flags for GTT and visible VRAM are identical.
>> > According to the spec, for types with identical flags, we should give
>> > the one with better performance a lower index.
>> >
>> > Previously, apps which follow the spec guidance for choosing a memory
>> > type would have picked the GTT type in preference to visible VRAM (all
>> > Feral games will do this), and end up with lower performance.
>> >
>> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
>> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
>> > other (Feral) games and saw similar improvement on those as well.
>>
>> Well that patch doesn't looks like a good idea to me.
>>
>> Using VRAM over uncached GTT should have something between no and only
>> minimal performance difference on APU.
>>
>> To make things even worse VRAM is still needed for scanout and newer
>> laptops have only a very very low default setting (32 or 16MB). So you
>> can end up in VRAM clashing on those systems.
>>
>> Can you check some kernel statistics to figure out what exactly is going
>> on here?
>>
>
> What statistics should I look at?
>
>
> First of all take a look at amdgpu_gem_info file in the debugfs directory
> while using GTT and match that to using VRAM. You should see a lot more of
> GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag
> is missing we have found the problem.
>
> If that looks ok, then take a look at the ttm_page_pool or
> ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got.
> Huge pages should be used for anything larger than 2MB, if not we have
> found the problem.
>
> If that still isn't the issue I need to take a look at the VM code again
> and see if we still map VRAM/GTT differently on APUs.
>

OK, got around to looking at this. amdgpu_gem_info does have more USWC
entries when using GTT. I've attached the output from VRAM vs GTT in case
you can spot anything else in there.

ttm_page_pool has 9806 wc, 238 wc huge, no uc or uc huge.

FWIW this was from kernel 5.0.10, I just upgraded to 5.1.6 and still the
same perf difference there.

Thanks,
Alex


>
> Thanks,
> Christian.
>
>
> Thanks,
> Alex
>
>
>>
>> Regards,
>> Christian.
>>
>> >
>> > Signed-off-by: Alex Smith 
>> > ---
>> > I noticed that the memory types advertised on my Raven laptop looked a
>> > bit odd so played around with it and found this. I'm not sure if it is
>> > actually expected that the performance difference between visible VRAM
>> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
>> > are clear (and consistent, tested multiple times).
>> > ---
>> >   src/amd/vulkan/radv_device.c | 18 +++---
>> >   1 file changed, 15 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> > index 3cf050ed220..d36ee226ebd 100644
>> > --- a/src/amd/vulkan/radv_device.c
>> > +++ b/src/amd/vulkan/radv_device.c
>> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
>> radv_physical_device *device)
>> >   .heapIndex = vram_index,
>> >   };
>> >   }
>> > - if (gart_index >= 0) {
>> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>> >   device->mem_type_indices[type_count] =
>> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>> >   device->memory_properties.memoryTypes[type_count++] =
>> (VkMemoryType) {
>> >   .propertyFlags =
>> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
>> > - (device->rad_info.has_dedicated_vram ? 0 :
>> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
>> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>> >   .heapIndex = gart_index,
>> >   };
>> >   }
>> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
>> radv_physical_device *device)
>> >   .heapIndex = visible_vram_index,
>> >   };
>> >   }
>> > + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
>> > + /* Put GTT after visible VRAM for GPUs without dedicated
>> VRAM
>> > +  * as they have identical property flags, and according
>> to the
>> > +  * spec, for types with identical flags, the one with
>> greater
>> > +  * performance must be given a lower index. */
>> > + device->mem_type_indices[type_count] =
>> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>> > + 

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Alex Smith
Thanks, will have a look - can't look right now, will see if I can sometime
tomorrow.

Alex

On Mon, 3 Jun 2019 at 13:27, Koenig, Christian 
wrote:

> Am 03.06.19 um 14:21 schrieb Alex Smith:
>
> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian 
> wrote:
>
>> Am 02.06.19 um 12:32 schrieb Alex Smith:
>> > Put the uncached GTT type at a higher index than the visible VRAM type,
>> > rather than having GTT first.
>> >
>> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
>> > type, and the property flags for GTT and visible VRAM are identical.
>> > According to the spec, for types with identical flags, we should give
>> > the one with better performance a lower index.
>> >
>> > Previously, apps which follow the spec guidance for choosing a memory
>> > type would have picked the GTT type in preference to visible VRAM (all
>> > Feral games will do this), and end up with lower performance.
>> >
>> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
>> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
>> > other (Feral) games and saw similar improvement on those as well.
>>
>> Well that patch doesn't looks like a good idea to me.
>>
>> Using VRAM over uncached GTT should have something between no and only
>> minimal performance difference on APU.
>>
>> To make things even worse VRAM is still needed for scanout and newer
>> laptops have only a very very low default setting (32 or 16MB). So you
>> can end up in VRAM clashing on those systems.
>>
>> Can you check some kernel statistics to figure out what exactly is going
>> on here?
>>
>
> What statistics should I look at?
>
>
> First of all take a look at amdgpu_gem_info file in the debugfs directory
> while using GTT and match that to using VRAM. You should see a lot more of
> GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag
> is missing we have found the problem.
>
> If that looks ok, then take a look at the ttm_page_pool or
> ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got.
> Huge pages should be used for anything larger than 2MB, if not we have
> found the problem.
>
> If that still isn't the issue I need to take a look at the VM code again
> and see if we still map VRAM/GTT differently on APUs.
>
> Thanks,
> Christian.
>
>
> Thanks,
> Alex
>
>
>>
>> Regards,
>> Christian.
>>
>> >
>> > Signed-off-by: Alex Smith 
>> > ---
>> > I noticed that the memory types advertised on my Raven laptop looked a
>> > bit odd so played around with it and found this. I'm not sure if it is
>> > actually expected that the performance difference between visible VRAM
>> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
>> > are clear (and consistent, tested multiple times).
>> > ---
>> >   src/amd/vulkan/radv_device.c | 18 +++---
>> >   1 file changed, 15 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> > index 3cf050ed220..d36ee226ebd 100644
>> > --- a/src/amd/vulkan/radv_device.c
>> > +++ b/src/amd/vulkan/radv_device.c
>> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
>> radv_physical_device *device)
>> >   .heapIndex = vram_index,
>> >   };
>> >   }
>> > - if (gart_index >= 0) {
>> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>> >   device->mem_type_indices[type_count] =
>> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>> >   device->memory_properties.memoryTypes[type_count++] =
>> (VkMemoryType) {
>> >   .propertyFlags =
>> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
>> > - (device->rad_info.has_dedicated_vram ? 0 :
>> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
>> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>> >   .heapIndex = gart_index,
>> >   };
>> >   }
>> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
>> radv_physical_device *device)
>> >   .heapIndex = visible_vram_index,
>> >   };
>> >   }
>> > + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
>> > + /* Put GTT after visible VRAM for GPUs without dedicated
>> VRAM
>> > +  * as they have identical property flags, and according
>> to the
>> > +  * spec, for types with identical flags, the one with
>> greater
>> > +  * performance must be given a lower index. */
>> > + device->mem_type_indices[type_count] =
>> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>> > + device->memory_properties.memoryTypes[type_count++] =
>> (VkMemoryType) {
>> > + .propertyFlags =
>> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
>> > + VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>> > + 

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Koenig, Christian
Am 03.06.19 um 14:21 schrieb Alex Smith:
On Mon, 3 Jun 2019 at 11:57, Koenig, Christian 
mailto:christian.koe...@amd.com>> wrote:
Am 02.06.19 um 12:32 schrieb Alex Smith:
> Put the uncached GTT type at a higher index than the visible VRAM type,
> rather than having GTT first.
>
> When we don't have dedicated VRAM, we don't have a non-visible VRAM
> type, and the property flags for GTT and visible VRAM are identical.
> According to the spec, for types with identical flags, we should give
> the one with better performance a lower index.
>
> Previously, apps which follow the spec guidance for choosing a memory
> type would have picked the GTT type in preference to visible VRAM (all
> Feral games will do this), and end up with lower performance.
>
> On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> other (Feral) games and saw similar improvement on those as well.

Well that patch doesn't looks like a good idea to me.

Using VRAM over uncached GTT should have something between no and only
minimal performance difference on APU.

To make things even worse VRAM is still needed for scanout and newer
laptops have only a very very low default setting (32 or 16MB). So you
can end up in VRAM clashing on those systems.

Can you check some kernel statistics to figure out what exactly is going
on here?

What statistics should I look at?

First of all take a look at amdgpu_gem_info file in the debugfs directory while 
using GTT and match that to using VRAM. You should see a lot more of GTT ... 
CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag is missing 
we have found the problem.

If that looks ok, then take a look at the ttm_page_pool or ttm_dma_page_pool 
file and see how many wc/uc and wc/uc huge pages you got. Huge pages should be 
used for anything larger than 2MB, if not we have found the problem.

If that still isn't the issue I need to take a look at the VM code again and 
see if we still map VRAM/GTT differently on APUs.

Thanks,
Christian.


Thanks,
Alex


Regards,
Christian.

>
> Signed-off-by: Alex Smith 
> mailto:asm...@feralinteractive.com>>
> ---
> I noticed that the memory types advertised on my Raven laptop looked a
> bit odd so played around with it and found this. I'm not sure if it is
> actually expected that the performance difference between visible VRAM
> and GTT is so large, seeing as it's not dedicated VRAM, but the results
> are clear (and consistent, tested multiple times).
> ---
>   src/amd/vulkan/radv_device.c | 18 +++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 3cf050ed220..d36ee226ebd 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
> radv_physical_device *device)
>   .heapIndex = vram_index,
>   };
>   }
> - if (gart_index >= 0) {
> + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>   device->mem_type_indices[type_count] = 
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>   device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {
>   .propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> - (device->rad_info.has_dedicated_vram ? 0 : 
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>   .heapIndex = gart_index,
>   };
>   }
> @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
> radv_physical_device *device)
>   .heapIndex = visible_vram_index,
>   };
>   }
> + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> + /* Put GTT after visible VRAM for GPUs without dedicated VRAM
> +  * as they have identical property flags, and according to the
> +  * spec, for types with identical flags, the one with greater
> +  * performance must be given a lower index. */
> + device->mem_type_indices[type_count] = 
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> + device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {
> + .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> + VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> + .heapIndex = gart_index,
> + };
> + }
>   if (gart_index >= 0) {
>   device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
>   device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {


___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Alex Smith
On Mon, 3 Jun 2019 at 11:57, Koenig, Christian 
wrote:

> Am 02.06.19 um 12:32 schrieb Alex Smith:
> > Put the uncached GTT type at a higher index than the visible VRAM type,
> > rather than having GTT first.
> >
> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
> > type, and the property flags for GTT and visible VRAM are identical.
> > According to the spec, for types with identical flags, we should give
> > the one with better performance a lower index.
> >
> > Previously, apps which follow the spec guidance for choosing a memory
> > type would have picked the GTT type in preference to visible VRAM (all
> > Feral games will do this), and end up with lower performance.
> >
> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> > other (Feral) games and saw similar improvement on those as well.
>
> Well that patch doesn't looks like a good idea to me.
>
> Using VRAM over uncached GTT should have something between no and only
> minimal performance difference on APU.
>
> To make things even worse VRAM is still needed for scanout and newer
> laptops have only a very very low default setting (32 or 16MB). So you
> can end up in VRAM clashing on those systems.
>
> Can you check some kernel statistics to figure out what exactly is going
> on here?
>

What statistics should I look at?

Thanks,
Alex


>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Alex Smith 
> > ---
> > I noticed that the memory types advertised on my Raven laptop looked a
> > bit odd so played around with it and found this. I'm not sure if it is
> > actually expected that the performance difference between visible VRAM
> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
> > are clear (and consistent, tested multiple times).
> > ---
> >   src/amd/vulkan/radv_device.c | 18 +++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> > index 3cf050ed220..d36ee226ebd 100644
> > --- a/src/amd/vulkan/radv_device.c
> > +++ b/src/amd/vulkan/radv_device.c
> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> >   .heapIndex = vram_index,
> >   };
> >   }
> > - if (gart_index >= 0) {
> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
> >   device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> >   device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> >   .propertyFlags =
> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> > - (device->rad_info.has_dedicated_vram ? 0 :
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> >   .heapIndex = gart_index,
> >   };
> >   }
> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> >   .heapIndex = visible_vram_index,
> >   };
> >   }
> > + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> > + /* Put GTT after visible VRAM for GPUs without dedicated
> VRAM
> > +  * as they have identical property flags, and according to
> the
> > +  * spec, for types with identical flags, the one with
> greater
> > +  * performance must be given a lower index. */
> > + device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> > + device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > + .propertyFlags =
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> > + VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> > + .heapIndex = gart_index,
> > + };
> > + }
> >   if (gart_index >= 0) {
> >   device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_CACHED;
> >   device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Koenig, Christian
Am 02.06.19 um 12:32 schrieb Alex Smith:
> Put the uncached GTT type at a higher index than the visible VRAM type,
> rather than having GTT first.
>
> When we don't have dedicated VRAM, we don't have a non-visible VRAM
> type, and the property flags for GTT and visible VRAM are identical.
> According to the spec, for types with identical flags, we should give
> the one with better performance a lower index.
>
> Previously, apps which follow the spec guidance for choosing a memory
> type would have picked the GTT type in preference to visible VRAM (all
> Feral games will do this), and end up with lower performance.
>
> On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> other (Feral) games and saw similar improvement on those as well.

Well that patch doesn't looks like a good idea to me.

Using VRAM over uncached GTT should have something between no and only 
minimal performance difference on APU.

To make things even worse VRAM is still needed for scanout and newer 
laptops have only a very very low default setting (32 or 16MB). So you 
can end up in VRAM clashing on those systems.

Can you check some kernel statistics to figure out what exactly is going 
on here?

Regards,
Christian.

>
> Signed-off-by: Alex Smith 
> ---
> I noticed that the memory types advertised on my Raven laptop looked a
> bit odd so played around with it and found this. I'm not sure if it is
> actually expected that the performance difference between visible VRAM
> and GTT is so large, seeing as it's not dedicated VRAM, but the results
> are clear (and consistent, tested multiple times).
> ---
>   src/amd/vulkan/radv_device.c | 18 +++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 3cf050ed220..d36ee226ebd 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
> radv_physical_device *device)
>   .heapIndex = vram_index,
>   };
>   }
> - if (gart_index >= 0) {
> + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>   device->mem_type_indices[type_count] = 
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>   device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {
>   .propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> - (device->rad_info.has_dedicated_vram ? 0 : 
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>   .heapIndex = gart_index,
>   };
>   }
> @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
> radv_physical_device *device)
>   .heapIndex = visible_vram_index,
>   };
>   }
> + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> + /* Put GTT after visible VRAM for GPUs without dedicated VRAM
> +  * as they have identical property flags, and according to the
> +  * spec, for types with identical flags, the one with greater
> +  * performance must be given a lower index. */
> + device->mem_type_indices[type_count] = 
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> + device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {
> + .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> + VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> + .heapIndex = gart_index,
> + };
> + }
>   if (gart_index >= 0) {
>   device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
>   device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Samuel Pitoiset


On 6/3/19 9:46 AM, Alex Smith wrote:
On Sun, 2 Jun 2019 at 11:59, Bas Nieuwenhuizen 
mailto:b...@basnieuwenhuizen.nl>> wrote:


On Sun, Jun 2, 2019 at 12:32 PM Alex Smith
mailto:asm...@feralinteractive.com>>
wrote:
>
> Put the uncached GTT type at a higher index than the visible
VRAM type,
> rather than having GTT first.
>
> When we don't have dedicated VRAM, we don't have a non-visible VRAM
> type, and the property flags for GTT and visible VRAM are identical.
> According to the spec, for types with identical flags, we should
give
> the one with better performance a lower index.
>
> Previously, apps which follow the spec guidance for choosing a
memory
> type would have picked the GTT type in preference to visible
VRAM (all
> Feral games will do this), and end up with lower performance.
>
> On a Ryzen 5 2500U laptop (Raven Ridge), this improves average
FPS in
> the Rise of the Tomb Raider benchmark by up to ~30%. Tested a
couple of
> other (Feral) games and saw similar improvement on those as well.
>
> Signed-off-by: Alex Smith mailto:asm...@feralinteractive.com>>
> ---
> I noticed that the memory types advertised on my Raven laptop
looked a
> bit odd so played around with it and found this. I'm not sure if
it is
> actually expected that the performance difference between
visible VRAM
> and GTT is so large, seeing as it's not dedicated VRAM, but the
results
> are clear (and consistent, tested multiple times).

AFAIU it is still using different memory paths, with GTT using
different pagetables (those from the CPU I believe on APUs) and
possible CPU snooping.

Main risk here seems applications pushing out driver internal stuff
(descriptor sets etc.) from "VRAM", posssibly hitting perf elsewhere.


Driver internal allocations have higher BO priorities than all app 
allocations, wouldn't that help avoid that? I'm not sure how much 
effect the priorities actually have...

Priorities shouldn't matter much.


That said,

Reviewed-by: Bas Nieuwenhuizen mailto:b...@basnieuwenhuizen.nl>>

> ---
>  src/amd/vulkan/radv_device.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c
b/src/amd/vulkan/radv_device.c
> index 3cf050ed220..d36ee226ebd 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
radv_physical_device *device)
>                         .heapIndex = vram_index,
>                 };
>         }
> -       if (gart_index >= 0) {
> +       if (gart_index >= 0 &&
device->rad_info.has_dedicated_vram) {
>                 device->mem_type_indices[type_count] =
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>  device->memory_properties.memoryTypes[type_count++] =
(VkMemoryType) {
>                         .propertyFlags =
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> -  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> -  (device->rad_info.has_dedicated_vram ? 0 :
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> +  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>                         .heapIndex = gart_index,
>                 };
>         }
> @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
radv_physical_device *device)
>                         .heapIndex = visible_vram_index,
>                 };
>         }
> +       if (gart_index >= 0 &&
!device->rad_info.has_dedicated_vram) {
> +               /* Put GTT after visible VRAM for GPUs without
dedicated VRAM
> +                * as they have identical property flags, and
according to the
> +                * spec, for types with identical flags, the one
with greater
> +                * performance must be given a lower index. */
> +               device->mem_type_indices[type_count] =
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> +  device->memory_properties.memoryTypes[type_count++] =
(VkMemoryType) {
> +                       .propertyFlags =
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> +  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> +  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> +                       .heapIndex = gart_index,
> +               };
> +       }
>         if (gart_index >= 0) {
>                 device->mem_type_indices[type_count] =
RADV_MEM_TYPE_GTT_CACHED;
>  device->memory_properties.memoryTypes[type_count++] =
(VkMemoryType) {
> --
> 2.21.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/mesa-dev



Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Alex Smith
On Sun, 2 Jun 2019 at 11:59, Bas Nieuwenhuizen 
wrote:

> On Sun, Jun 2, 2019 at 12:32 PM Alex Smith 
> wrote:
> >
> > Put the uncached GTT type at a higher index than the visible VRAM type,
> > rather than having GTT first.
> >
> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
> > type, and the property flags for GTT and visible VRAM are identical.
> > According to the spec, for types with identical flags, we should give
> > the one with better performance a lower index.
> >
> > Previously, apps which follow the spec guidance for choosing a memory
> > type would have picked the GTT type in preference to visible VRAM (all
> > Feral games will do this), and end up with lower performance.
> >
> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> > other (Feral) games and saw similar improvement on those as well.
> >
> > Signed-off-by: Alex Smith 
> > ---
> > I noticed that the memory types advertised on my Raven laptop looked a
> > bit odd so played around with it and found this. I'm not sure if it is
> > actually expected that the performance difference between visible VRAM
> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
> > are clear (and consistent, tested multiple times).
>
> AFAIU it is still using different memory paths, with GTT using
> different pagetables (those from the CPU I believe on APUs) and
> possible CPU snooping.
>
> Main risk here seems applications pushing out driver internal stuff
> (descriptor sets etc.) from "VRAM", posssibly hitting perf elsewhere.
>

Driver internal allocations have higher BO priorities than all app
allocations, wouldn't that help avoid that? I'm not sure how much effect
the priorities actually have...


> That said,
>
> Reviewed-by: Bas Nieuwenhuizen 
>
> > ---
> >  src/amd/vulkan/radv_device.c | 18 +++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> > index 3cf050ed220..d36ee226ebd 100644
> > --- a/src/amd/vulkan/radv_device.c
> > +++ b/src/amd/vulkan/radv_device.c
> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> > .heapIndex = vram_index,
> > };
> > }
> > -   if (gart_index >= 0) {
> > +   if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
> > device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> > device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > .propertyFlags =
> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > -   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> > -   (device->rad_info.has_dedicated_vram ? 0 :
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> > +   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> > .heapIndex = gart_index,
> > };
> > }
> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> > .heapIndex = visible_vram_index,
> > };
> > }
> > +   if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> > +   /* Put GTT after visible VRAM for GPUs without dedicated
> VRAM
> > +* as they have identical property flags, and according
> to the
> > +* spec, for types with identical flags, the one with
> greater
> > +* performance must be given a lower index. */
> > +   device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> > +   device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > +   .propertyFlags =
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> > +   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > +   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> > +   .heapIndex = gart_index,
> > +   };
> > +   }
> > if (gart_index >= 0) {
> > device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_CACHED;
> > device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > --
> > 2.21.0
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

On 6/2/19 12:32 PM, Alex Smith wrote:

Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.

When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.

Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.

On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.

Signed-off-by: Alex Smith 
---
I noticed that the memory types advertised on my Raven laptop looked a
bit odd so played around with it and found this. I'm not sure if it is
actually expected that the performance difference between visible VRAM
and GTT is so large, seeing as it's not dedicated VRAM, but the results
are clear (and consistent, tested multiple times).
---
  src/amd/vulkan/radv_device.c | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 3cf050ed220..d36ee226ebd 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
.heapIndex = vram_index,
};
}
-   if (gart_index >= 0) {
+   if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
device->mem_type_indices[type_count] = 
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
-   (device->rad_info.has_dedicated_vram ? 0 : 
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
+   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
.heapIndex = gart_index,
};
}
@@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
.heapIndex = visible_vram_index,
};
}
+   if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
+   /* Put GTT after visible VRAM for GPUs without dedicated VRAM
+* as they have identical property flags, and according to the
+* spec, for types with identical flags, the one with greater
+* performance must be given a lower index. */
+   device->mem_type_indices[type_count] = 
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
+   device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
+   .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
+   .heapIndex = gart_index,
+   };
+   }
if (gart_index >= 0) {
device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-02 Thread Bas Nieuwenhuizen
On Sun, Jun 2, 2019 at 12:32 PM Alex Smith  wrote:
>
> Put the uncached GTT type at a higher index than the visible VRAM type,
> rather than having GTT first.
>
> When we don't have dedicated VRAM, we don't have a non-visible VRAM
> type, and the property flags for GTT and visible VRAM are identical.
> According to the spec, for types with identical flags, we should give
> the one with better performance a lower index.
>
> Previously, apps which follow the spec guidance for choosing a memory
> type would have picked the GTT type in preference to visible VRAM (all
> Feral games will do this), and end up with lower performance.
>
> On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> other (Feral) games and saw similar improvement on those as well.
>
> Signed-off-by: Alex Smith 
> ---
> I noticed that the memory types advertised on my Raven laptop looked a
> bit odd so played around with it and found this. I'm not sure if it is
> actually expected that the performance difference between visible VRAM
> and GTT is so large, seeing as it's not dedicated VRAM, but the results
> are clear (and consistent, tested multiple times).

AFAIU it is still using different memory paths, with GTT using
different pagetables (those from the CPU I believe on APUs) and
possible CPU snooping.

Main risk here seems applications pushing out driver internal stuff
(descriptor sets etc.) from "VRAM", posssibly hitting perf elsewhere.
That said,

Reviewed-by: Bas Nieuwenhuizen 

> ---
>  src/amd/vulkan/radv_device.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 3cf050ed220..d36ee226ebd 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
> radv_physical_device *device)
> .heapIndex = vram_index,
> };
> }
> -   if (gart_index >= 0) {
> +   if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
> device->mem_type_indices[type_count] = 
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {
> .propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> -   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> -   (device->rad_info.has_dedicated_vram ? 0 : 
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> +   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> .heapIndex = gart_index,
> };
> }
> @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
> radv_physical_device *device)
> .heapIndex = visible_vram_index,
> };
> }
> +   if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> +   /* Put GTT after visible VRAM for GPUs without dedicated VRAM
> +* as they have identical property flags, and according to the
> +* spec, for types with identical flags, the one with greater
> +* performance must be given a lower index. */
> +   device->mem_type_indices[type_count] = 
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> +   device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {
> +   .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> +   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> +   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> +   .heapIndex = gart_index,
> +   };
> +   }
> if (gart_index >= 0) {
> device->mem_type_indices[type_count] = 
> RADV_MEM_TYPE_GTT_CACHED;
> device->memory_properties.memoryTypes[type_count++] = 
> (VkMemoryType) {
> --
> 2.21.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-02 Thread Alex Smith
Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.

When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.

Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.

On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.

Signed-off-by: Alex Smith 
---
I noticed that the memory types advertised on my Raven laptop looked a
bit odd so played around with it and found this. I'm not sure if it is
actually expected that the performance difference between visible VRAM
and GTT is so large, seeing as it's not dedicated VRAM, but the results
are clear (and consistent, tested multiple times).
---
 src/amd/vulkan/radv_device.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 3cf050ed220..d36ee226ebd 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
.heapIndex = vram_index,
};
}
-   if (gart_index >= 0) {
+   if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
device->mem_type_indices[type_count] = 
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
-   (device->rad_info.has_dedicated_vram ? 0 : 
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
+   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
.heapIndex = gart_index,
};
}
@@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
.heapIndex = visible_vram_index,
};
}
+   if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
+   /* Put GTT after visible VRAM for GPUs without dedicated VRAM
+* as they have identical property flags, and according to the
+* spec, for types with identical flags, the one with greater
+* performance must be given a lower index. */
+   device->mem_type_indices[type_count] = 
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
+   device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
+   .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
+   .heapIndex = gart_index,
+   };
+   }
if (gart_index >= 0) {
device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev