On Wed, Jun 24, 2026 at 10:59 AM Morten Brørup <[email protected]> wrote: > > +Pavan Nikhilesh, +Stephen Hemminger, +Wathsala Vithanage, +Bruce Richardson, > +Thomas Monjalon > > > From: saeed bishara [mailto:[email protected]] > > Sent: Tuesday, 23 June 2026 16.11 > > > > > > also, instead of adding cacheline for this profiling data, can we > > > > share with line 1 that used solely for xstats? > > > > > > This profiling data is 4 indexes * 2 values * 8-byte fields, so one > > cache line in itself. > > make sense. > > btw, the default value of RTE_GRAPH_BURST_SIZE is 256, I suspect that > > real applications will enforce smaller burst when pulling from input > > devices (e.g. 32). Do you expect such cases to change > > RTE_GRAPH_BURST_SIZE? > > Excellent question! I don't know. > They should. E.g. an application optimized for latency should certainly not > process bursts of 256 objects. > > IMO, the root problem is the lack of a unified burst size across DPDK, which > causes every library to be designed with its own optimal burst size. > E.g. the Mbuf library uses 64 (for rte_pktmbuf_free_bulk()), and the Graph > library uses 256. > > There has been an attempt at introducing a unified burst size [1] for DPDK, > but it met a lot of resistance, so it still needs to be refined before we can > reach a conclusion. > The drivers supposedly can report an "optimal" burst size at run-time, which > the application can then use. But the application is unable to configure its > internal burst sizes if one driver reports 64 and another reports 32. > I'm strongly in favor of a build time constant, used across DPDK. The default > value should work reasonably well across drivers and libraries. > And if an application wants to optimize for performance (either throughput or > latency), the developer should experiment to find the optimal value. > Furthermore, designing for a build time constant max burst size throughout > DPDK might provide performance benefits in itself, as the compiler can > optimize for this. > > [1]: https://inbox.dpdk.org/dev/[email protected]/ > > Now, back to your question... > As a workaround, I can sample Graph node performance data for 32 objects, > instead of sampling for RTE_GRAPH_BURST_SIZE / 2. I see, so there is no simple static parameter here. what about tracking max burst, then report the calls/cycles for that case, the user will also find what was that max burst, and how often it occured.
saeed

