On Fri, Jan 09, 2026 at 03:13:24PM -0500, Frank Li wrote:
> Patch depend on
> https://lore.kernel.org/imx/[email protected]/T/#t
>
> Only test eDMA, have not tested HDMA.
Hi Frank,
I expect this series may be revisited in the near future, since the first
dependency series reached v7 and looks close to landing.
With the latest versions of the two dependencies:
- [PATCH v7 0/9] dmaengine: Add new API to combine configuration and
descriptor preparation
https://lore.kernel.org/dmaengine/[email protected]/
- [PATCH v2 00/11] dmaengine: dw-edma: flatten desc structions and simple code
https://lore.kernel.org/dmaengine/[email protected]/
I tested this RFT series with the HDMA engine on a SpacemiT K3.
The test results are below, using the same format as your results:
Baseline, before applying the three series (v7 + v2 + this RFT)
Rnd read , 4KB, QD=1 , 1 job : IOPS=8567, BW=33.5MiB/s (35.1MB/s)
Rnd read , 4KB, QD=32, 1 job : IOPS=55.5k, BW=217MiB/s (227MB/s)
Rnd read , 4KB, QD=32, 4 jobs: IOPS=83.0k, BW=324MiB/s (340MB/s)
Rnd read , 128KB, QD=1 , 1 job : IOPS=3817, BW=477MiB/s (500MB/s)
Rnd read , 128KB, QD=32, 1 job : IOPS=10.8k, BW=1346MiB/s (1411MB/s)
Rnd read , 128KB, QD=32, 4 jobs: IOPS=11.2k, BW=1403MiB/s (1471MB/s)
Rnd read , 512KB, QD=1 , 1 job : IOPS=1515, BW=758MiB/s (794MB/s)
Rnd read , 512KB, QD=32, 1 job : IOPS=2795, BW=1399MiB/s (1467MB/s)
Rnd read , 512KB, QD=32, 4 jobs: IOPS=2795, BW=1404MiB/s (1472MB/s)
Rnd write, 4KB, QD=1 , 1 job : IOPS=9035, BW=35.3MiB/s (37.0MB/s)
Rnd write, 4KB, QD=32, 1 job : IOPS=38.3k, BW=149MiB/s (157MB/s)
Rnd write, 4KB, QD=32, 4 jobs: IOPS=41.8k, BW=163MiB/s (171MB/s)
Rnd write, 128KB, QD=1 , 1 job : IOPS=3969, BW=496MiB/s (520MB/s)
Rnd write, 128KB, QD=32, 1 job : IOPS=8260, BW=1033MiB/s (1083MB/s)
Rnd write, 128KB, QD=32, 4 jobs: IOPS=8295, BW=1038MiB/s (1089MB/s)
Seq read , 128KB, QD=1 , 1 job : IOPS=4609, BW=576MiB/s (604MB/s)
Seq read , 128KB, QD=32, 1 job : IOPS=10.8k, BW=1345MiB/s (1410MB/s)
Seq read , 512KB, QD=1 , 1 job : IOPS=1524, BW=762MiB/s (799MB/s)
Seq read , 512KB, QD=32, 1 job : IOPS=2799, BW=1401MiB/s (1469MB/s)
Seq read , 1MB, QD=32, 1 job : IOPS=1401, BW=1404MiB/s (1472MB/s)
Seq write, 128KB, QD=1 , 1 job : IOPS=3722, BW=465MiB/s (488MB/s)
Seq write, 128KB, QD=32, 1 job : IOPS=8246, BW=1031MiB/s (1081MB/s)
Seq write, 512KB, QD=1 , 1 job : IOPS=1283, BW=642MiB/s (673MB/s)
Seq write, 512KB, QD=32, 1 job : IOPS=2072, BW=1038MiB/s (1088MB/s)
Seq write, 1MB, QD=32, 1 job : IOPS=1037, BW=1040MiB/s (1091MB/s)
Rnd rdwr , 4K..1MB, QD=8 , 4 jobs: IOPS=1540, BW=768MiB/s (805MB/s)
IOPS=1549, BW=768MiB/s (805MB/s)
After your three series (v7 + v2 + this)
Rnd read , 4KB, QD=1 , 1 job : IOPS=7216, BW=28.2MiB/s (29.6MB/s)
Rnd read , 4KB, QD=32, 1 job : IOPS=61.1k, BW=239MiB/s (250MB/s)
Rnd read , 4KB, QD=32, 4 jobs: IOPS=75.3k, BW=294MiB/s (309MB/s)
Rnd read , 128KB, QD=1 , 1 job : IOPS=4711, BW=589MiB/s (618MB/s)
Rnd read , 128KB, QD=32, 1 job : IOPS=10.8k, BW=1354MiB/s (1420MB/s)
Rnd read , 128KB, QD=32, 4 jobs: IOPS=11.2k, BW=1403MiB/s (1471MB/s)
Rnd read , 512KB, QD=1 , 1 job : IOPS=1497, BW=749MiB/s (785MB/s)
Rnd read , 512KB, QD=32, 1 job : IOPS=2802, BW=1403MiB/s (1471MB/s)
Rnd read , 512KB, QD=32, 4 jobs: IOPS=2798, BW=1405MiB/s (1474MB/s)
Rnd write, 4KB, QD=1 , 1 job : IOPS=7411, BW=29.0MiB/s (30.4MB/s)
Rnd write, 4KB, QD=32, 1 job : IOPS=39.3k, BW=153MiB/s (161MB/s)
Rnd write, 4KB, QD=32, 4 jobs: IOPS=42.9k, BW=167MiB/s (176MB/s)
Rnd write, 128KB, QD=1 , 1 job : IOPS=3736, BW=467MiB/s (490MB/s)
Rnd write, 128KB, QD=32, 1 job : IOPS=8302, BW=1038MiB/s (1089MB/s)
Rnd write, 128KB, QD=32, 4 jobs: IOPS=8314, BW=1041MiB/s (1091MB/s)
Seq read , 128KB, QD=1 , 1 job : IOPS=4092, BW=512MiB/s (536MB/s)
Seq read , 128KB, QD=32, 1 job : IOPS=10.8k, BW=1354MiB/s (1420MB/s)
Seq read , 512KB, QD=1 , 1 job : IOPS=1474, BW=737MiB/s (773MB/s)
Seq read , 512KB, QD=32, 1 job : IOPS=2794, BW=1399MiB/s (1467MB/s)
Seq read , 1MB, QD=32, 1 job : IOPS=1401, BW=1404MiB/s (1472MB/s)
Seq write, 128KB, QD=1 , 1 job : IOPS=4135, BW=517MiB/s (542MB/s)
Seq write, 128KB, QD=32, 1 job : IOPS=8307, BW=1039MiB/s (1089MB/s)
Seq write, 512KB, QD=1 , 1 job : IOPS=1259, BW=630MiB/s (660MB/s)
Seq write, 512KB, QD=32, 1 job : IOPS=2073, BW=1038MiB/s (1089MB/s)
Seq write, 1MB, QD=32, 1 job : IOPS=1034, BW=1038MiB/s (1088MB/s)
Rnd rdwr , 4K..1MB, QD=8 , 4 jobs: IOPS=1531, BW=763MiB/s (801MB/s)
IOPS=1540, BW=765MiB/s (802MB/s)
On this HDMA setup, I did not observe a clear performance difference from
applying the three series alone. Still, I like the overall direction.
P.S.
Separately, as a follow-up experiment, I also prototyped an extra series on top
of your three series that allows us to make use of HDMA watermark interrupts.
With that series, in particular for the high queue-depth cases, the results
improved noticeably on this platform. I haven't posted that series yet though.
After your three series (v7 + v2 + this) + use of HDMA watermark interrupts
Rnd read , 4KB, QD=1 , 1 job : IOPS=8016, BW=31.3MiB/s (32.8MB/s)
Rnd read , 4KB, QD=32, 1 job : IOPS=63.4k, BW=248MiB/s (260MB/s)
Rnd read , 4KB, QD=32, 4 jobs: IOPS=92.7k, BW=362MiB/s (380MB/s)
Rnd read , 128KB, QD=1 , 1 job : IOPS=3530, BW=441MiB/s (463MB/s)
Rnd read , 128KB, QD=32, 1 job : IOPS=12.0k, BW=1500MiB/s (1573MB/s)
Rnd read , 128KB, QD=32, 4 jobs: IOPS=12.4k, BW=1555MiB/s (1631MB/s)
Rnd read , 512KB, QD=1 , 1 job : IOPS=1541, BW=771MiB/s (808MB/s)
Rnd read , 512KB, QD=32, 1 job : IOPS=3116, BW=1560MiB/s (1636MB/s)
Rnd read , 512KB, QD=32, 4 jobs: IOPS=3099, BW=1556MiB/s (1632MB/s)
Rnd write, 4KB, QD=1 , 1 job : IOPS=8748, BW=34.2MiB/s (35.8MB/s)
Rnd write, 4KB, QD=32, 1 job : IOPS=57.6k, BW=225MiB/s (236MB/s)
Rnd write, 4KB, QD=32, 4 jobs: IOPS=80.3k, BW=314MiB/s (329MB/s)
Rnd write, 128KB, QD=1 , 1 job : IOPS=3878, BW=485MiB/s (508MB/s)
Rnd write, 128KB, QD=32, 1 job : IOPS=9798, BW=1225MiB/s (1285MB/s)
Rnd write, 128KB, QD=32, 4 jobs: IOPS=9970, BW=1248MiB/s (1308MB/s)
Seq read , 128KB, QD=1 , 1 job : IOPS=4516, BW=565MiB/s (592MB/s)
Seq read , 128KB, QD=32, 1 job : IOPS=12.0k, BW=1497MiB/s (1570MB/s)
Seq read , 512KB, QD=1 , 1 job : IOPS=1571, BW=786MiB/s (824MB/s)
Seq read , 512KB, QD=32, 1 job : IOPS=3073, BW=1538MiB/s (1613MB/s)
Seq read , 1MB, QD=32, 1 job : IOPS=1573, BW=1576MiB/s (1653MB/s)
Seq write, 128KB, QD=1 , 1 job : IOPS=3977, BW=497MiB/s (521MB/s)
Seq write, 128KB, QD=32, 1 job : IOPS=9806, BW=1226MiB/s (1286MB/s)
Seq write, 512KB, QD=1 , 1 job : IOPS=1404, BW=702MiB/s (736MB/s)
Seq write, 512KB, QD=32, 1 job : IOPS=2496, BW=1250MiB/s (1310MB/s)
Seq write, 1MB, QD=32, 1 job : IOPS=1252, BW=1256MiB/s (1317MB/s)
Rnd rdwr , 4K..1MB, QD=8 , 4 jobs: IOPS=1682, BW=836MiB/s (877MB/s)
IOPS=1688, BW=838MiB/s (879MB/s)
Best regards,
Koichiro
> Corn case have not tested, such as pause/resume transfer.
>
> Before
>
> Rnd read, 4KB, QD=1, 1 job : IOPS=6780, BW=26.5MiB/s (27.8MB/s)
> Rnd read, 4KB, QD=32, 1 job : IOPS=28.6k, BW=112MiB/s (117MB/s)
> Rnd read, 4KB, QD=32, 4 jobs: IOPS=33.4k, BW=130MiB/s (137MB/s)
> Rnd read, 128KB, QD=1, 1 job : IOPS=1188, BW=149MiB/s (156MB/s)
> Rnd read, 128KB, QD=32, 1 job : IOPS=1440, BW=180MiB/s (189MB/s)
> Rnd read, 128KB, QD=32, 4 jobs: IOPS=1282, BW=160MiB/s (168MB/s)
> Rnd read, 512KB, QD=1, 1 job : IOPS=254, BW=127MiB/s (134MB/s)
> Rnd read, 512KB, QD=32, 1 job : IOPS=354, BW=177MiB/s (186MB/s)
> Rnd read, 512KB, QD=32, 4 jobs: IOPS=388, BW=194MiB/s (204MB/s)
> Rnd write, 4KB, QD=1, 1 job : IOPS=6282, BW=24.5MiB/s (25.7MB/s)
> Rnd write, 4KB, QD=32, 1 job : IOPS=24.9k, BW=97.5MiB/s (102MB/s)
> Rnd write, 4KB, QD=32, 4 jobs: IOPS=27.4k, BW=107MiB/s (112MB/s)
> Rnd write, 128KB, QD=1, 1 job : IOPS=1098, BW=137MiB/s (144MB/s)
> Rnd write, 128KB, QD=32, 1 job : IOPS=1195, BW=149MiB/s (157MB/s)
> Rnd write, 128KB, QD=32, 4 jobs: IOPS=1120, BW=140MiB/s (147MB/s)
> Seq read, 128KB, QD=1, 1 job : IOPS=936, BW=117MiB/s (123MB/s)
> Seq read, 128KB, QD=32, 1 job : IOPS=1218, BW=152MiB/s (160MB/s)
> Seq read, 512KB, QD=1, 1 job : IOPS=301, BW=151MiB/s (158MB/s)
> Seq read, 512KB, QD=32, 1 job : IOPS=360, BW=180MiB/s (189MB/s)
> Seq read, 1MB, QD=32, 1 job : IOPS=193, BW=194MiB/s (203MB/s)
> Seq write, 128KB, QD=1, 1 job : IOPS=796, BW=99.5MiB/s (104MB/s)
> Seq write, 128KB, QD=32, 1 job : IOPS=1019, BW=127MiB/s (134MB/s)
> Seq write, 512KB, QD=1, 1 job : IOPS=213, BW=107MiB/s (112MB/s)
> Seq write, 512KB, QD=32, 1 job : IOPS=273, BW=137MiB/s (143MB/s)
> Seq write, 1MB, QD=32, 1 job : IOPS=168, BW=168MiB/s (177MB/s)
> Rnd rdwr, 4K..1MB, QD=8, 4 jobs: IOPS=255, BW=128MiB/s (134MB/s)
> IOPS=266, BW=135MiB/s (141MB/s)
>
> After
>
> Rnd read, 4KB, QD=1, 1 job : IOPS=6148, BW=24.0MiB/s (25.2MB/s)
> Rnd read, 4KB, QD=32, 1 job : IOPS=29.4k, BW=115MiB/s (121MB/s)
> Rnd read, 4KB, QD=32, 4 jobs: IOPS=38.8k, BW=151MiB/s (159MB/s)
> Rnd read, 128KB, QD=1, 1 job : IOPS=859, BW=107MiB/s (113MB/s)
> Rnd read, 128KB, QD=32, 1 job : IOPS=1504, BW=188MiB/s (197MB/s)
> Rnd read, 128KB, QD=32, 4 jobs: IOPS=1531, BW=191MiB/s (201MB/s)
> Rnd read, 512KB, QD=1, 1 job : IOPS=238, BW=119MiB/s (125MB/s)
> Rnd read, 512KB, QD=32, 1 job : IOPS=390, BW=195MiB/s (205MB/s)
> Rnd read, 512KB, QD=32, 4 jobs: IOPS=404, BW=202MiB/s (212MB/s)
> Rnd write, 4KB, QD=1, 1 job : IOPS=5801, BW=22.7MiB/s (23.8MB/s)
> Rnd write, 4KB, QD=32, 1 job : IOPS=24.7k, BW=96.6MiB/s (101MB/s)
> Rnd write, 4KB, QD=32, 4 jobs: IOPS=32.7k, BW=128MiB/s (134MB/s)
> Rnd write, 128KB, QD=1, 1 job : IOPS=744, BW=93.1MiB/s (97.6MB/s)
> Rnd write, 128KB, QD=32, 1 job : IOPS=1278, BW=160MiB/s (168MB/s)
> Rnd write, 128KB, QD=32, 4 jobs: IOPS=1278, BW=160MiB/s (168MB/s)
> Seq read, 128KB, QD=1, 1 job : IOPS=853, BW=107MiB/s (112MB/s)
> Seq read, 128KB, QD=32, 1 job : IOPS=1511, BW=189MiB/s (198MB/s)
> Seq read, 512KB, QD=1, 1 job : IOPS=240, BW=120MiB/s (126MB/s)
> Seq read, 512KB, QD=32, 1 job : IOPS=386, BW=193MiB/s (203MB/s)
> Seq read, 1MB, QD=32, 1 job : IOPS=200, BW=201MiB/s (211MB/s)
> Seq write, 128KB, QD=1, 1 job : IOPS=749, BW=93.7MiB/s (98.3MB/s)
> Seq write, 128KB, QD=32, 1 job : IOPS=1266, BW=158MiB/s (166MB/s)
> Seq write, 512KB, QD=1, 1 job : IOPS=198, BW=99.0MiB/s (104MB/s)
> Seq write, 512KB, QD=32, 1 job : IOPS=352, BW=176MiB/s (185MB/s)
> Seq write, 1MB, QD=32, 1 job : IOPS=184, BW=184MiB/s (193MB/s)
> Rnd rdwr, 4K..1MB, QD=8, 4 jobs: IOPS=287, BW=145MiB/s (152MB/s)
> IOPS=299, BW=149MiB/s (156MB/s)
>
> Signed-off-by: Frank Li <[email protected]>
> ---
> Frank Li (5):
> dmaengine: dw-edma: Add dw_edma_core_ll_cur_idx() to get completed link
> entry pos
> dmaengine: dw-edma: Move dw_hdma_set_callback_result() up
> dmaengine: dw-edma: Make DMA link list work as a circular buffer
> dmaengine: dw-edma: Dynamitc append new request during dmaengine running
> dmaengine: dw-edma: Add trace support
>
> drivers/dma/dw-edma/Makefile | 3 +
> drivers/dma/dw-edma/dw-edma-core.c | 215
> ++++++++++++++++++++++++----------
> drivers/dma/dw-edma/dw-edma-core.h | 42 ++++++-
> drivers/dma/dw-edma/dw-edma-trace.c | 4 +
> drivers/dma/dw-edma/dw-edma-trace.h | 150 ++++++++++++++++++++++++
> drivers/dma/dw-edma/dw-edma-v0-core.c | 39 +++++-
> drivers/dma/dw-edma/dw-hdma-v0-core.c | 17 +++
> 7 files changed, 409 insertions(+), 61 deletions(-)
> ---
> base-commit: 020f6d8442f35105660a29d0d236d3f8650c8142
> change-id: 20251212-edma_dymatic-a57843ff0dfe
>
> Best regards,
> --
> Frank Li <[email protected]>
>