On 22 June 2017 at 09:07, Andres Freund <and...@anarazel.de> wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
>> On 22 June 2017 at 08:29, Andres Freund <and...@anarazel.de> wrote:
>>
>> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
>>
>> That's likely worth doing, but can probably wait for a separate patch.
>
> I don't think so, we should get this right, it could have API influence.
>
>
>> The kernel will usually do some packet aggregation unless we use
>> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
>> is IMO not worth worrying about just yet.
>
> 1)
>                                         /*
>                                          * Select socket options: no delay of 
> outgoing data for
>                                          * TCP sockets, nonblock mode, 
> close-on-exec. Fail if any
>                                          * of this fails.
>                                          */
>                                         if (!IS_AF_UNIX(addr_cur->ai_family))
>                                         {
>                                                 if (!connectNoDelay(conn))
>                                                 {
>                                                         
> pqDropConnection(conn, true);
>                                                         conn->addr_cur = 
> addr_cur->ai_next;
>                                                         continue;
>                                                 }
>                                         }
>
> 2) Even if nodelay weren't set, this can still lead to smaller packets
>    being sent, because you start sending normal sized tcp packets,
>    rather than jumbo ones, even if configured (pretty common these
>    days).
>
> 3) Syscall overhead is actually quite significant.

Fair enough, and *headdesk* re not checking NODELAY. I thought I'd
checked for our use of that before, but I must've remembered wrong.

We could use TCP_CORK but it's not portable and it'd be better to just
collect up a buffer to dispatch.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to