Olivier, 

>> It's because we haven't gotten to testing the patch yet, and figuring  > out 
>> all the problems. Putting it in and modifying MBUF needs a bit of  > time - 
>> one other option that I've looked at is to let the transmit  > offload parts 
>> (except for the VLAN) flow onto the second cache  > line. That doesn't seem 
>> to have a performance hit at this point -  > since it's going to be 
>> populated before calling transmit anyway, it's  > cache hot. Have we thought 
>> of simply doing that instead of these  > changes that have net negative side 
>> effects in terms of mbuf mods?

> I think that the performance gain on a real use case provided by this patch 
> series can justify a really small impact (see my test reports) on 
> demonstration-only applications: my testpmd iofwd test with the txqflags 
> option disabling many mbuf features is not representative of a real world 
> application.

[Venky] I did see your test reports. I also made the point that the tests we 
have are insufficient for testing the impact. If you look at data_ofs, it 
actually has an impact on two sides - the driver and the upper layer. We do not 
at this time have a test for the upper layer/accessor. Secondly, there is a 
whole class of apps (fast path route for example) that aren't in your purview 
that do not need txqflags. Calling it not representative of a real world 
application is incorrect. 

Secondly, your testpmd run baseline performance should be higher. At this 
point, it is about 40% off from the numbers we see on the baseline on the same 
CPU. If the baseline is incorrect, I cannot judge anything more on the 
performance. We need to get the baseline performance the same, and then compare 
impact. 

> In my opinion, moving offload parts outside in another cache line would have 
> an impact on performance. If not, why would you exclude vlan?
But this is speculation. As Neil and Thomas suggested previously, we should 
rely on performance and functional tests.

[Venky] I exclude VLAN because it is something explicitly set by the Rx side of 
the driver. Having Rx access a second cache line will generate a performance 
impact (can be mitigated by a prefetch, but it will cost more instructions, and 
cannot be deterministically controlled). The rest of the structure is on the 
transmit side - which is going to be cache hot - at least in LLC anyway. There 
are cases where this will not be in LLC - and we have a few of those. Those 
however, we can mitigate.

> Today, there is no alternative that brings equivalent features and better 
> performance (I mean there is no patch nor test reports). If the series is 
> applied after your ack, it won't prevent anyone to bring new enhancements or 
> reworks on top it. 

[Venky] I don't think reworking core data structures (especially regressing 
core data structures) is a good thing. We have kept this relatively stable over 
5 releases, sometimes at the impact of performance, and churning data 
structures is not a good thing. 

BR,
- Venky

Reply via email to