On Thu, Apr 17, 2025 at 1:06 AM Christian König
<christian.koe...@amd.com> wrote:
>
> Am 16.04.25 um 23:51 schrieb Juan Yescas:
>
> On Wed, Apr 16, 2025 at 4:34 AM Christian König
> <christian.koe...@amd.com> wrote:
>
>
> Am 15.04.25 um 19:19 schrieb Juan Yescas:
>
> This change sets the allocation orders for the different page sizes
> (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders
> for large page sizes were calculated incorrectly, this caused system
> heap to allocate from 2% to 4% more memory on 16KiB page size kernels.
>
> This change was tested on 4k/16k page size kernels.
>
> Signed-off-by: Juan Yescas <jyes...@google.com>
> ---
>  drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c 
> b/drivers/dma-buf/heaps/system_heap.c
> index 26d5dc89ea16..54674c02dcb4 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP, 
> HIGH_ORDER_GFP, LOW_ORDER_GFP};
>   * to match with the sizes often found in IOMMUs. Using order 4 pages instead
>   * of order 0 pages can significantly improve the performance of many IOMMUs
>   * by reducing TLB pressure and time spent updating page tables.
> + *
> + * Note: When the order is 0, the minimum allocation is PAGE_SIZE. The 
> possible
> + * page sizes for ARM devices could be 4K, 16K and 64K.
>   */
> -static const unsigned int orders[] = {8, 4, 0};
> +#define ORDER_1M (20 - PAGE_SHIFT)
> +#define ORDER_64K (16 - PAGE_SHIFT)
> +#define ORDER_FOR_PAGE_SIZE (0)
> +static const unsigned int orders[] = {ORDER_1M, ORDER_64K, 
> ORDER_FOR_PAGE_SIZE};
> +#
>
> Good catch, but I think the defines are just overkill.
>
> What you should do instead is to subtract page shift when using the array.
>
> There are several occurrences of orders in the file and I think it is
> better to do the calculations up front in the array. Would you be ok
> if we get rid of the defines as per your suggestion and make the
> calculations during the definition of the array. Something like this:
>
> static const unsigned int orders[] = {20 - PAGE_SHIFT, 16 - PAGE_SHIFT, 0};
>
>
> Yeah that totally works for me as well. Just make sure that a code comment 
> nearby explains why 20, 16 and 0.
>

Thanks, the changes were added in the [PATCH v3].

> Apart from that using 1M, 64K and then falling back to 4K just sounds random 
> to me. We have especially pushed back on 64K more than once because it is 
> actually not beneficial in almost all cases.
>
> In the hardware where the driver is used, the 64K is beneficial for:
>
> Arm SMMUv3 (4KB granule size):
>            64KB (contiguous bit groups 16 4KB pages)
>
> SysMMU benefits from 64KB (“large” page) on 4k/16k page sizes.
>
>
> Question could this HW also work with pages larger than 64K? (I strongly 
> expect yes).
>

Yes, if the page sizes in the SMMUv3 are:
      - 4k, we can have 2MiB pages
     - 16k, we can have 32MiB pages

> I suggest to fix the code in system_heap_allocate to not over allocate 
> instead and just try the available orders like TTM does. This has proven to 
> be working architecture independent.
>
> Do you mean to have an implementation similar to __ttm_pool_alloc()?
>
>
> Yeah something like that.
>
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/ttm/ttm_pool.c?h=v6.15-rc2#n728
>
> If that is the case, we can try the change, run some benchmarks and
> submit the patch if we see positive results.
>
>
> Could be that this doesn't matter for your platform, but it's minimal extra 
> overhead asking for larger chunks first and it just avoids fragmenting 
> everything into 64K chunks.
>
> That is kind of important since the code in DMA-heaps should be platform 
> neutral if possible.

I agree, I'll make the change in another patch.

Thanks
Juan

>
> Regards,
> Christian.
>
>
> Thanks
> Juan
>
> Regards,
> Christian.
>
>  #define NUM_ORDERS ARRAY_SIZE(orders)
>
>  static struct sg_table *dup_sg_table(struct sg_table *table)
>
>

Reply via email to