On 15.09.2010 17:19, Bjoern A. Zeeb wrote:
On Mon, 13 Sep 2010, Andre Oppermann wrote:

Hey,

When a TCP connection via loopback back to localhost is made the whole
send, segmentation and receive path (with larger packets though) is still
executed. This has some considerable overhead.

To short-circuit the send and receive sockets on localhost TCP connections
I've made a proof-of-concept patch that directly places the data in the
other side's socket buffer without doing any packetization and other protocol
overhead (like UNIX domain sockets). The connections setup (SYN, SYN-ACK,
ACK) and shutdown are still handled by normal TCP segments via loopback so
that firewalling stills works. The actual payload data during the session
won't be seen and the sequence numbers don't move other than for SYN and FIN.
The sequence are remain valid though. Obviously tcpdump won't see any data
transfers either if the connection has fused sockets.

Preliminary testing (with WITNESS and INVARIANTS enabled) has shown stable
operation and a rough doubling of the throughput on loopback connections.
I've tested most socket teardown cases and it behaves fine. I'm not entirely
sure I've got all possible path's but the way it is integrated should properly
defuse the sockets in all situations.

Three comments in reverse order:

1 If S/S+A/A and shutdown aren't shortcut, can you always rely on proper
payload order, especially in the shutdown case?

Yes.  The payload is always directly placed in the receive socket buffer
of the other socket, never in the send buffer.  There is never any unsent
data left in the send buffer that could become reordered.

2 Given my experience with epairs, which are basically a loop with two
interfaces and even interface queues, any significant delay you are
seeing is _not_ due to longer code paths through the stack but
simply because of the netisr.

I haven't measured delay, only bandwidth.  And that's with WITNESS and
INVARIANTS enabled.  You are probably right, the netisr is taking its
toll.  Especially the TCP_INFO lock may have some contention in the
loopback case on SMP.  Though a lot of mbuf allocations, packet manipulations
and instructions (instruction cache) are avoided by fusing the sockets
together.

3 If properly doing this for TCP, we should probably also do it for
other protocols.

UNIX domain sockets already do this.  This implementation is particular
for TCP and only touches the protocol specific parts.  It's not done at
the socket layer.  For UDP it's not that easy to do as most UDP connections
are one-off packets and no permanent binding between two sockets exists.
For SCTP I don't know.  From glancing over the code it seems they have,
at least partially, their own socket buffer code.  How difficult a fused
socket there would be I can't say.

--
Andre
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to