Olivier,
>> It's because we haven't gotten to testing the patch yet, and figuring > out >> all the problems. Putting it in and modifying MBUF needs a bit of > time - >> one other option that I've looked at is to let the transmit > offload parts >> (except for the VLAN) flow onto the second cache > line. That doesn't seem >> to have a performance hit at this point - > since it's going to be >> populated before calling transmit anyway, it's > cache hot. Have we thought >> of simply doing that instead of these > changes that have net negative side >> effects in terms of mbuf mods? > I think that the performance gain on a real use case provided by this patch > series can justify a really small impact (see my test reports) on > demonstration-only applications: my testpmd iofwd test with the txqflags > option disabling many mbuf features is not representative of a real world > application. [Venky] I did see your test reports. I also made the point that the tests we have are insufficient for testing the impact. If you look at data_ofs, it actually has an impact on two sides - the driver and the upper layer. We do not at this time have a test for the upper layer/accessor. Secondly, there is a whole class of apps (fast path route for example) that aren't in your purview that do not need txqflags. Calling it not representative of a real world application is incorrect. Secondly, your testpmd run baseline performance should be higher. At this point, it is about 40% off from the numbers we see on the baseline on the same CPU. If the baseline is incorrect, I cannot judge anything more on the performance. We need to get the baseline performance the same, and then compare impact. > In my opinion, moving offload parts outside in another cache line would have > an impact on performance. If not, why would you exclude vlan? But this is speculation. As Neil and Thomas suggested previously, we should rely on performance and functional tests. [Venky] I exclude VLAN because it is something explicitly set by the Rx side of the driver. Having Rx access a second cache line will generate a performance impact (can be mitigated by a prefetch, but it will cost more instructions, and cannot be deterministically controlled). The rest of the structure is on the transmit side - which is going to be cache hot - at least in LLC anyway. There are cases where this will not be in LLC - and we have a few of those. Those however, we can mitigate. > Today, there is no alternative that brings equivalent features and better > performance (I mean there is no patch nor test reports). If the series is > applied after your ack, it won't prevent anyone to bring new enhancements or > reworks on top it. [Venky] I don't think reworking core data structures (especially regressing core data structures) is a good thing. We have kept this relatively stable over 5 releases, sometimes at the impact of performance, and churning data structures is not a good thing. BR, - Venky