On Fri, 24 Feb 2017, Yann Ylavic wrote:

The issue is potentially the huge (order big-n) allocations which
finally may hurt the system (fragmentation, OOM...).


Is this a real or theoretical problem?

Both. Fragmentation is a hard issue, but a constant is: the more you
ask for big allocs, the more likely you won't be serviced one day or
another (or another task will be killed for that, until yours).

Long living (or closed to memory limits) systems suffer from this, no
matter what allocator, small and large allocations fragment the memory
(httpd is likely not the only program on the system), the only
"remedy" is small order allocations (2^order pages, a "sane" order
being lower than 2, hence order-1 on a system with 4K pages is 8K
bytes).

I've only seen this class of issues on Linux systems where vm.min_free_kbytes is left at default in combination with better-than-GigE networking. Since we started to tune vm.min_free_kbytes to be in the order of 0.5s bursts at maximum-network-speed (ie. 512M for 10GigE) we haven't seen it in production. I think our working theory was too little space to handle bursts resulted in the kernel unable to figure out which file cache pages to throw out in time, but I think we never got to figuring out the exact reason...



However, for large file performance I really don't buy into the notion that
it's a good idea to break everything into tiny puny blocks. The potential
for wasting CPU cycles on this micro-management is rather big...

I don't think that a readv/writev of 16 iovecs of 8KB is (noticeably)
slower than read/write of contiguous 128K, both might well end up in a
scaterlist for the kernel/hardware.

Ah, true. Scatter/gather is magic...

I do find iovecs useful, it the small blocks that gets me into skeptic
mode...

Small blocks is not for networking, it's for internal use only.
And remember that TLS records are 16K max anyway, give 128KB to
openssl's SSL_write it will output 8 chunks of 16KB.

Oh, I had completely missed that limit on TLS record size...

Kinda related: We also have the support for larger page sizes with modern
CPUs. Has anyone investigated if it makes sense allocating memory pools in
chunks that fit those large pages?

I think PPC64 have 64K pages already.
APR pools are already based on the page size IIRC.

I was thinking of the huge/large pages available on x86:s, 2 MiB and maybe 1 GiB IIRC.

My thought was that doing 2 MiB allocations for the large memory pools instead of 4k might make sense for configurations when you have a lot of threads resulting in allocating that much memory eventually, one page instead of lots. On Linux transparent huge page support, when enabled, can take advantage of this leading to less TLB entries/misses.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     ni...@acc.umu.se
---------------------------------------------------------------------------
 *  <- Tribble  *  <- Tribble having Safe Sex
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Reply via email to