On 4/18/2018 6:51 AM, Willem de Bruijn wrote:
On Wed, Apr 18, 2018 at 9:47 AM, Sowmini Varadhan
<sowmini.varad...@oracle.com> wrote:
On (04/18/18 06:35), Eric Dumazet wrote:
There is no change at all.

This will only be used as a mechanism to send X packets of same size.

So instead of X system calls , one system call.

One traversal of some expensive part of the host stack.

The content on the wire should be the same.
I'm sorry that's not how I interpret Willem's email below
(and maybe I misunderstood)

the following taken from https://www.spinics.net/lists/netdev/msg496150.html

Sowmini> If yes, how will the recvmsg differentiate between the case
Sowmini> (2000 byte message followed by 512 byte message) and
Sowmini> (1472 byte message, 526 byte message, then 512 byte message),
Sowmini> in other words, how are UDP message boundary semantics preserved?

Willem> They aren't. This is purely an optimization to amortize the cost of
Willem> repeated tx stack traversal. Unlike UFO, which would preserve the
Willem> boundaries of the original larger than MTU datagram.

As I understand Willem's explanation, if I do a sendmsg of 2000 bytes,
- classic UDP will send 2 IP fragments, the first one with a full UDP
   header, and the IP header indicating that this is the first frag for
   that ipid, with more frags to follow. The second frag will have the
   rest with the same ipid, it will not have a udp header,
   and it will indicatet that it is the last frag (no more frags).

   The receiver can thus use the ipid, "more-frags" bit, frag offset etc
   to stitch the 2000 byte udp message together and pass it up on the udp
   socket.

- in the "GSO" proposal my 2000  bytes of data are sent as *two*
   udp packets, each of them with a unique udp header, and uh_len set
   to 1476 (for first) and 526 (for second). The receiver has no clue
   that they are both part of the same UDP datagram, So wire format
   is not the same, am I mistaken?
Eric is correct. If the application sets a segment size with UDP_SEGMENT
this is an instruction to the kernel to split the payload along that border into
separate discrete datagrams.

OK. So the sender app is passing the message boundary info to the kernel via 
the socket
option and letting the kernel split the large payload into multiple UDP 
segments.



It does not matter what the behavior is without setting this option. If a
process wants to send a larger than MTU datagram and rely on the
kernel to fragment, then it should not set the option.

Reply via email to