On Wed, Apr 18, 2018 at 9:47 AM, Sowmini Varadhan
<sowmini.varad...@oracle.com> wrote:
> On (04/18/18 06:35), Eric Dumazet wrote:
>>
>> There is no change at all.
>>
>> This will only be used as a mechanism to send X packets of same size.
>>
>> So instead of X system calls , one system call.
>>
>> One traversal of some expensive part of the host stack.
>>
>> The content on the wire should be the same.
>
> I'm sorry that's not how I interpret Willem's email below
> (and maybe I misunderstood)
>
> the following taken from https://www.spinics.net/lists/netdev/msg496150.html
>
> Sowmini> If yes, how will the recvmsg differentiate between the case
> Sowmini> (2000 byte message followed by 512 byte message) and
> Sowmini> (1472 byte message, 526 byte message, then 512 byte message),
> Sowmini> in other words, how are UDP message boundary semantics preserved?
>
> Willem> They aren't. This is purely an optimization to amortize the cost of
> Willem> repeated tx stack traversal. Unlike UFO, which would preserve the
> Willem> boundaries of the original larger than MTU datagram.
>
> As I understand Willem's explanation, if I do a sendmsg of 2000 bytes,
> - classic UDP will send 2 IP fragments, the first one with a full UDP
>   header, and the IP header indicating that this is the first frag for
>   that ipid, with more frags to follow. The second frag will have the
>   rest with the same ipid, it will not have a udp header,
>   and it will indicatet that it is the last frag (no more frags).
>
>   The receiver can thus use the ipid, "more-frags" bit, frag offset etc
>   to stitch the 2000 byte udp message together and pass it up on the udp
>   socket.
>
> - in the "GSO" proposal my 2000  bytes of data are sent as *two*
>   udp packets, each of them with a unique udp header, and uh_len set
>   to 1476 (for first) and 526 (for second). The receiver has no clue
>   that they are both part of the same UDP datagram, So wire format
>   is not the same, am I mistaken?

Eric is correct. If the application sets a segment size with UDP_SEGMENT
this is an instruction to the kernel to split the payload along that border into
separate discrete datagrams.

It does not matter what the behavior is without setting this option. If a
process wants to send a larger than MTU datagram and rely on the
kernel to fragment, then it should not set the option.

Reply via email to