Todd Lipcon created KUDU-2758:
---------------------------------
Summary: TLS socket writes in 16kb chunks with intervening
epoll/setsockopt syscalls
Key: KUDU-2758
URL: https://issues.apache.org/jira/browse/KUDU-2758
Project: Kudu
Issue Type: Bug
Components: perf, rpc, security
Reporter: Todd Lipcon
I noticed that krpc has the following syscall pattern:

{code}
rpc reactor-231 23122 [002] 35488410.994309: syscalls:sys_enter_epoll_wait:
epfd: 0x00000007, events: 0x02137520, maxevents: 0x00000040, timeout: 0x00000050
rpc reactor-231 23122 [002] 35488410.994310: syscalls:sys_exit_epoll_wait: 0x1
rpc reactor-231 23122 [002] 35488410.994313: syscalls:sys_enter_setsockopt:
fd: 0x00000011, level: 0x00000006, optname: 0x00000003, optval: 0x7fc80910175c,
optlen: 0x00000004
rpc reactor-231 23122 [002] 35488410.994314: syscalls:sys_exit_setsockopt: 0x0
rpc reactor-231 23122 [002] 35488410.994351: syscalls:sys_enter_write: fd:
0x00000011, buf: 0x7fc7e8059e93, count: 0x0000401d
rpc reactor-231 23122 [002] 35488410.994370: syscalls:sys_exit_write: 0x401d
rpc reactor-231 23122 [002] 35488410.994372: syscalls:sys_enter_setsockopt:
fd: 0x00000011, level: 0x00000006, optname: 0x00000003, optval: 0x7fc80910175c,
optlen: 0x00000004
rpc reactor-231 23122 [002] 35488410.994378: syscalls:sys_exit_setsockopt: 0x0
{code}
This block of syscalls repeats in a pretty tight loop -- epoll_wait, CORK,
write, UNCORK. The writes are always 0x401d bytes (just more than 16kb). I
found the following in the ssl_write manpage:
{quote}
SSL_write() will only return with success, when the complete contents of buf of
length num has been written. This default behaviour can be changed with the
SSL_MODE_ENABLE_PARTIAL_WRITE option of ssl_ctx_set_mode(3). When this flag is
set, SSL_write() will also return with success, when a partial write has been
successfully completed. In this case the SSL_write() operation is considered
completed. The bytes are sent and a new SSL_write() operation with a new buffer
(with the already sent bytes removed) must be started. A partial write is
performed with the size of a message block, which is 16kB for SSLv3/TLSv1.
{quote}
Seems likely we should be looping the writes before uncorking -- either until
we run into a temporary socket error or run out of stuff to write.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)