prime: Iterate SG DMA addresses separately

Christian König Thu, 12 Apr 2018 02:36:32 -0700

Am 12.04.2018 um 11:11 schrieb Lucas Stach:

Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:

Am 11.04.2018 um 19:11 schrieb Robin Murphy:

For dma_map_sg(), DMA API implementations are free to merge consecutive
segments into a single DMA mapping if conditions are suitable, thus the
resulting DMA addresses may be packed into fewer entries than
ttm->sg->nents implies.


drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
its callers either have to reject the 0 < count < nents case or risk
getting bogus addresses back later. Fortunately this is relatively easy
to deal with having to rejig structures to also store the mapped count,
since the total DMA length should still be equal to the total buffer
length. All we need is a separate scatterlist cursor to iterate the DMA
addresses separately from the CPU addresses.

Mhm, I think I like Sinas approach better.

See the hardware actually needs the dma_address on a page by page basis.

Joining multiple consecutive pages into one entry is just additional
overhead which we don't need.

But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
be in front of your GPU to map large pages as such, causing additional
overhead by means of additional TLB misses and page walks on the IOMMU
side.

And wouldn't you like to use huge pages at the GPU side, if the IOMMU
already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?


No, I wouldn't like to use that. We're already using that :)

But we use huge pages by allocating consecutive chunks of memory so thatboth the CPU as well as the GPU can increase their TLB hit rate.

What currently happens is that the DMA subsystem tries to coalescemultiple pages into on SG entry and we de-coalesce that inside thedriver again to create our random access array.

That is a huge waste of memory and CPU cycles and I actually wanted toget rid of it for quite some time now. Sinas approach seems to be a goodstep into the right direction to me to actually clean that up.

Detecting such a condition is much easier if the DMA map implementation
gives you the coalesced scatter entries.

A way which preserves both path would be indeed nice to have, but thatonly allows for the TLB optimization for the GPU and not the CPU anymore. So I actually see that as really minor use case.


Regards,
Christian.


Regards,
Lucas


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

Reply via email to