All writes to Linux sockets means the kernel copies to 2kiB buffers used by
SKBs.  It's copied to somewhere in the middle of that 2kiB buffer, so that
TCP/IP headers can be prepended by the kernel.  Even with TCP Segmentation
Offload, 2kiB buffers are still used; it just means that the TCP/IP headers
just need to be calculated once for an array of buffers, and then the
kernel puts an array of pointers in the network card's ring buffer.

The kernel will only put on the wire as much data as the current TCP
congestion window says, but it has to keep each packet in it's buffers
until the remote side ACKs that packet.

On Mon, Feb 27, 2017 at 2:25 PM, William A Rowe Jr <wr...@rowe-clan.net>
wrote:

> On Mon, Feb 27, 2017 at 12:16 PM, Jacob Champion <champio...@gmail.com>
> wrote:
> >
> > On 02/23/2017 04:48 PM, Yann Ylavic wrote:
> >> On Wed, Feb 22, 2017 at 8:55 PM, Daniel Lescohier wrote:
> >>>
> >>>
> >>> IOW: read():Three copies: copy from filesystem cache to httpd
> >>> read() buffer to encrypted-data buffer to kernel socket buffer.
> >
> >>
> >> Not really, "copy from filesystem cache to httpd read() buffer" is
> >> likely mapping to userspace, so no copy (on read) here.
> >
> > Oh, cool. Which kernels do this? It seems like the VM tricks would have
> to
> > be incredibly intricate for this to work; reads typically don't happen in
> > page-sized chunks, nor to aligned addresses. Linux in particular has
> > comments in the source explaining that they *don't* do it for other
> syscalls
> > (e.g. vmsplice)... but I don't have much experience with non-Linux
> systems.
>
> I don't understand this claim.
>
> If read() returned an API-provisioned buffer, it could point wherever it
> liked,
> including a 4k page. As things stand the void* (or char*) of the read()
> buffer
> is at an arbitrary offset, no common OS I'm familiar with maps a page to
> a non-page-aligned address.
>
> The kernel socket send[v]() call might avoid copy in the direct-send case,
> depending on the implementation.
>

Reply via email to