On 6/18/20 2:50 PM, Jonathan Lemon wrote:
> On Thu, Jun 18, 2020 at 11:12:57AM -0700, Eric Dumazet wrote:
>>
>>
>> On 6/18/20 9:09 AM, Jonathan Lemon wrote:
>>> Adds a "rx_hd_split" private flag parameter to ethtool.
>>>
>>> This enables header splitting, and sets up the fragment mappings.
>>> The feature is currently only enabled for netgpu channels.
>>
>> We are using a similar idea (pseudo header split) to implement
>> 4096+(headers) MTU at Google,
>> to enable TCP RX zerocopy on x86.
>>
>> Patch for mlx4 has not been sent upstream yet.
>>
>> For mlx4, we are using a single buffer of 128*(number_of_slots_per_RX_RING),
>> and 86 bytes for the first frag, so that the payload exactly fits a 4096
>> bytes page.
>>
>> (In our case, most of our data TCP packets only have 12 bytes of TCP options)
>>
>>
>> I suggest that instead of a flag, you use a tunable, that can be set by
>> ethtool,
>> so that the exact number of bytes can be tuned, instead of hard coded in the
>> driver.
>
> Sounds reasonable - in the long run, it would be ideal to have the
> hardware actually perform header splitting, but for now using a tunable
> fixed offset will work. In the same vein, there should be a similar
> setting for the TCP option padding on the sender side.
>
Some NIC have variable header split (Intel ixgbe I am pretty sure)
We use a mix of NIC, some with variable header splits, some with fixed pseudo
header split (mlx4)
Because of this, we had to limit TCP advmss to 4108 (4096 + 12), regardless of
the NIC abilities.