Hi Mike,

Mike Lambert wrote:
Trond, any thoughts?

Trond is actually on vacation, but I did steal a few cycles of his time and asked about this.
I'd like to double-check that there isn't a reason we can't support
preallocation without getpagesizes() before attempting to manually
patch memcache and play with our production system here.

There's no reason you can't do that. There may be a slightly cleaner integration approach Trond and I talked through. I'll try to code that up here in the next few days... but for now you may try your approach to see if it helps alleviate the issue you were seeing.

Incidentially, how did the memory fragmentation manifest itself on your system? I mean, could you see any effect on apps running on the system?


Thanks,
Mike

On Jul 13, 8:38 pm, Mike Lambert <[email protected]> wrote:
On Jul 10, 1:37 pm, Matt Ingenthron <[email protected]> wrote:





Mike Lambert wrote:
Currently the -L flag is only enabled if
HAVE_GETPAGESIZES&&HAVE_MEMCNTL. I'm curious what the motivation is
for something like that? In our experience, for some memcache pools we
end up fragmenting memory due to the repeated allocation of 1MB slabs
around all the other hashtables and free lists going on. We know we
want to allocate all memory upfront, but can't seem to do that on a
Linux system.
The primary motivation was more about not beating up the TLB cache on
the CPU when running with large heaps.  There are users with large heaps
already, so this should help if the underlying OS supports large pages. TLB cache sizes are getting bigger in CPUs, but virtualization is more
common and memory heaps are growing faster.
I'd like to have some empirical data on how big a difference the -L flag
makes, but that assumes a workload profile.  I should be able to hack
one up and do this with memcachetest, but I've just not done it yet.  :)
To put it more concretely, here is a proposed change to make -L do a
contiguous preallocation even on machines without getpagesizes tuning.
My memcached server doesn't seem to crash, but I'm not sure if that's
a proper litmus test. What are the pros/cons of doing something like
this?
This feels more related to the -k flag, and that it should be using
madvise() in there somewhere too.  It wouldn't be a bad idea to separate
these necessarily.   I don't know that the day after 1.4.0 is the day to
redefine -L though, but it's not necessarily bad. We should wait for
Trond's repsonse to see what he thinks about this since he implemented
it.  :)
Haha, yeah, the release of 1.4.0 reminded me I wanted to send this
email. Sorry for the bad timing.

-k keeps the memory from getting paged out to disk (which is a very
goodt hing in our case.)
-L appears to me (who isn't aware of what getpagesizes does) to be
related to preallocation with big allocations, which I thought was
what I wanted.

If you want, I'd be just as happy with a -A flag that turns on
preallocation, but without any of getpagesizes() tuning. It'd force
one big slabs allocation and that's it.

Also, I did some testing with this (-L) some time back (admittedly on
OpenSolaris) and the actual behavior will vary based on the memory
allocation library you're using and what it does with the OS
underneath.  I didn't try Linux variations, but that may be worthwhile
for you.  IIRC, default malloc would wait for page-fault to do the
actual memory allocation, so there'd still be risk of fragmentation.
We do use Linux, but haven't tested in production with my modified -L
patch. What I *have* noticed is that when we allocate a 512MB
hashtable, that shows up in linux as mmap-ed contiguous block of
memory. Fromhttp://m.linuxjournal.com/article/6390, we "For very
large requests, malloc() uses the mmap() system call to find
addressable memory space. This process helps reduce the negative
effects of memory fragmentation when large blocks of memory are freed
but locked by smaller, more recently allocated blocks lying between
them and the end of the allocated space."

I was hoping to get the same large mmap for all our slabs, out of the
way in a different address space in a way that didn't interfere with
the actual memory allocator itself, so that the linux allocator could
then focus on balancing just the small allocations without any page
waste.

Thanks,
Mike

Reply via email to