In a message dated 01-08-17 17:51:20 EDT, Bill Stoddard wrote...
> > > I've replaced the malloc/frees in the bucket code on Windows with a
> power
> > of
> > > 2 allocator
> > > and it makes a BIG difference in performance. I expect the same on
> every
> > OS
> > > with the
> > > exception of Linux.
> >
> > Kevin Kiley wrote...
> >
> > You might want to add the same 'power of 2' stuff to a call
> > that is absolutely killing the windows version.
> >
> > apr_sendv() in /srclib/apr/network_io/win32/sendrecv.c
> > is using a malloc()/free() combo for only 8 bytes!
> > It's using WSASend() and can 'scatter gather' but so far
> > I've never seen more than 2 separate buffers to 'gather'
> > on any particular call to apr_sendv().
> >
> > There is a comment there about 'putting it on the stack' but
> > you'd be better off doing this sooner than later because
> > the malloc()/free() for only 8 bytes is killing you on every
> > transmit.
> >
>
> I'll do it this weekend.
> Thanks,
>
> Bill
FWIW... I did a quick test here and easily added more than
60 transactions per second on a 10000 hit simple home
page grab crunch test with 2.0.24.
Here is the code I used to change WSASend allocations
to stack rather than malloc()/free()
NOTE: Code below still preserves ability to 'malloc()' if some
absurd number of scatter gather 'nvecs' arrives. The reason
for the 2 execution paths is for speed and limit 'if' logic to
just one iteration at all times. Even with the way the filtering
can fragment things I really doubt it will ever need
more than 500 iovecs on any one call but who knows.
I am sure you will do whatever you want anyway so this
apr_sendv() rewrite is obviously just a suggestion...
[snip]
APR_DECLARE(apr_status_t) apr_sendv(apr_socket_t *sock,
const struct iovec *vec,
apr_int32_t nvec, apr_size_t *nbytes)
{
apr_ssize_t rv;
int i;
int lasterror;
DWORD dwBytes = 0;
/* sizeof(WSABUF) is 8 bytes so 500 entries takes 4k.. */
#define APR_SENDV_MAX_STACK_BUFFERS 500
#define APR_SENDV_MAX_STACK_BUFFERS_SIZE (500*8)
char wsabuf_stack_buffer[ APR_SENDV_MAX_STACK_BUFFERS_SIZE ];
pWsaData_on_the_stack = (LPWSABUF) &wsabuf_stack_buffer[0];
/* Todo: Put the WSABUF array on the stack. */
LPWSABUF pWsaData;
/* Use 2 separate execution paths for speed and avoid */
/* over-use of 'if' statements around 'free()' calls... */
/* There's probably no way anyone is ever going to need */
/* need more than 500 WSABUF records so put the most used */
/* condition FIRST for fastest overall pickup time... */
if ( nvec < APR_SENDV_MAX_STACK_BUFFERS )
{
/* No need to malloc()... just use the stack... */
for (i = 0; i < nvec; i++)
{
pWsaData_on_the_stack[i].buf = vec[i].iov_base;
pWsaData_on_the_stack[i].len = vec[i].iov_len;
}
rv = WSASend(
sock->sock, pWsaData_on_the_stack, nvec, &dwBytes, 0, NULL, NULL);
if (rv == SOCKET_ERROR) {
lasterror = apr_get_netos_error();
return lasterror;
}
}
else /* Ouch! There are more than 500 'scatter gather' buffers! */
{
/* Only thing we can do is use malloc() if there are that */
/* many actual buffers to 'gather'... */
pWsaData = (LPWSABUF) malloc(sizeof(WSABUF) * nvec);
if (!pWsaData)
{
return APR_ENOMEM;
}
for (i = 0; i < nvec; i++)
{
pWsaData[i].buf = vec[i].iov_base;
pWsaData[i].len = vec[i].iov_len;
}
rv = WSASend(sock->sock, pWsaData, nvec, &dwBytes, 0, NULL, NULL);
if (rv == SOCKET_ERROR) {
lasterror = apr_get_netos_error();
free(pWsaData);
return lasterror;
}
free(pWsaData);
}/* End 'else( malloc()/free() was needed )' */
*nbytes = dwBytes;
return APR_SUCCESS;
}
[snip]