On Wed, May 23, 2018 at 8:02 PM, Marcelo Ricardo Leitner
<[email protected]> wrote:
> On Wed, Apr 18, 2018 at 09:49:18AM -0400, Willem de Bruijn wrote:
>> I just hacked up a sendmmsg extension to the benchmark to verify.
>> Indeed that does not have nearly the same benefit as GSO:
>>
>> udp tx:    976 MB/s   695394 calls/s  16557 msg/s
>>
>> This matches the numbers seen from TCP without TSO and GSO.
>> That also has few system calls, but observes per MTU stack traversal.
>
> Reviving this old thread because it's the only place I saw sendmmsg
> being mentioned.
>
> sendmmsg shouldn't be considered as an alternative, but rather as a
> complement. Then instead of the application building one large request
> and request the stack to fragment it, it could simply build the
> sendmmsg request and the stack would group the mmsg into a gso skb. It
> seems more natural to the application. But well, both (sendmmsg and
> the option to fragment) are Linux-specific..
>
> For that we need sendmmsg to do something smarter than doing several
> sendmsg calls, yes.

I agree. See also my original point:

"An alternative implementation that would allow non-uniform
  segment length is to use GSO_BY_FRAGS like SCTP. This would
  likely require MSG_MORE to build the list using multiple
  send calls (or one sendmmsg). The two approaches are not
  mutually-exclusive, so that could be a follow-up."

Clear advantages of GSO_BY_FRAGS are that segments do
not have to be of equal length and that converting existing users
of sendmmsg is trivial.

On the other hand, this is less likely to be offloaded to hardware,
as it requires non-constant metadata in the descriptor.

Both cases also potentially apply to the GRO path to allow for
efficient forwarding. And to the udp socket rx layer to allow for
queuing batches of datagrams at a time, then carving off one
gso_size per recvmsg.

Reply via email to