On Thu, Nov 21, 2024 at 11:06 AM Keith Busch <[email protected]> wrote:
> If you have the time, could you compare with using xarray instead?

Sure. Good idea.

**With the submitted patches applied AND using an xarray for
vaddr-to-block translations:**
```
dmapool test: size:16   align:16   blocks:8192 time:37954
dmapool test: size:64   align:64   blocks:8192 time:40036
dmapool test: size:256  align:256  blocks:8192 time:41942
dmapool test: size:1024 align:1024 blocks:2048 time:10964
dmapool test: size:4096 align:4096 blocks:1024 time:6101
dmapool test: size:68   align:32   blocks:8192 time:41307
```

The xarray approach shows a slight improvement in performance compared
to the maple tree approach.

FWIW, I implemented the two with slightly different semantics:
- In the maple tree implementation, I saved the `block`'s entire
`vaddr` range, allowing any `vaddr` within the `block` to be passed to
`dma_pool_free()`.
- In the xarray implementation, I saved only the `block's` base
`vaddr`, requiring `dma_pool_free()` to be called with the exact
`vaddr` returned by `dma_pool_alloc()`. This aligns with the DMA pool
API documentation, which specifies that the `vaddr` returned by
`dma_pool_alloc()` should be passed to `dma_pool_free()`.

Let me know if you'd like further adjustments.

Thanks,

Brian Johannesmeyer

Reply via email to