On Tue, Jun 28, 2022 at 1:53 PM Ivan Zhakov <i...@visualsvn.com> wrote:
>
> On Tue, 28 Jun 2022 at 14:08, <yla...@apache.org> wrote:
> >
> > Author: ylavic
> > Date: Tue Jun 28 11:08:32 2022
> > New Revision: 1902312
> >
> > URL: http://svn.apache.org/viewvc?rev=1902312&view=rev
> > Log:
> > apr_socket_sendv: WIN32: Limit the number of WSABUFs allocated for a single 
> > call.
> >
> > * network_io/win32/sendrecv.c():
> >   Define WSABUF_ON_HEAP to 500.
> >
> > * network_io/win32/sendrecv.c(apr_socket_sendv):
> >   Do not sendv more than WSABUF_ON_HEAP WSABUFs.
> >
> This commit changes the behavior for blocking sockets: now
> apr_socket_sendv() may result in a partial write.

It's always been like that for sockets, write[v]() has non-partial
guarantees on some systems only (Linux, Windows?) *and* for regular
files (and, IIRC, pipes up to PIPE_BUF).
There can't be any guarantee for communications with a negotiated
window size or datagrams, if you try to send 10GB of data in a single
call on a TCP socket it won't block until all the data have been sent
(unless you tune the SND_BUF?), not to talk about UDP.

>
> As far I see apr_socket_sendv() does not explicitly document whether
> partial writes are allowed for *blocking* sockets. Posix send() seems
> to send all data [1]:
> [[
> When the message does not fit into the send buffer of the socket,
> send() normally blocks, unless the socket has been placed in
> nonblocking I/O mode. In nonblocking mode it would fail with the error
> EAGAIN or EWOULDBLOCK in this case. The select(2) call may be used to
> determine when it is possible to send more data.
> ]]

This does not say it sends all data, whenever the send buffer gets
some space again the system will send what it can and return that (as
opposed to buffering the remaining data and loop).

>
> Also even APR code assumes that apr_socket_sendv() doesn't allow
> partial writes. See apr_memcache_delete() for example:
> https://demo-server.visualsvn.com/!/#asf/view/r1902313/apr/apr/trunk/memcache/apr_memcache.c?line=923

I don't think that the change breaks this case where nvec == 3 (given
that WSABUF_ON_HEAP == 500).

>
> Not too sure that limiting the size of the malloc() is worth it in
> this case. What are the reasons to limit it?

On a 64bit system, iov_len is a size_t (64bit) while WSABUF->len is a
DWORD (32bit), so we could create a really huge number of WSABUFs (~
2^32) with a single iovec. So I first had:

#define WSABUF_ON_HEAP (APR_SIZE_MAX / sizeof(WSABUF))

which would have limited nvec to the hard limits, and let
malloc(APR_SIZE_MAX) fail for pathological cases.
But I then thought someone calling apr_socket_sendv() with more than
500 iovecs (an arbitrary value admittedly, but that's 8KB for the
WSABUFs already) kind of deserves implicit partial writes..

>
> Another approach would be to split writes to multiple WSASend calls
> when iovec count is more than WSABUF_ON_STACK, but this would break
> implicit assumption that apr_socket_sendv() is atomic like writev [2].

There is no atomicity guarantee besides regular files and pipes (both
probably up to PIPE_BUF only).

>
> PS: Do we have a test for such apr_socket_sendv() call?

Looks like we don't, besides some calls in testmemcache and testredis..

>
> [1] https://linux.die.net/man/2/send
> [2] https://linux.die.net/man/2/writev

Regards;
Yann.

Reply via email to