I've been exploring the idea for a while of writing a SLURM SPANK plugin to allow users to dynamically change the pagepool size on a node. Every now and then we have some users who would benefit significantly from a much larger pagepool on compute nodes but by default keep it on the smaller side to make as much physmem available as possible to batch work.

In testing, though, it seems as though reducing the pagepool doesn't quite release all of the memory. I don't really understand it because I've never before seen memory that was previously resident become un-resident but still maintain the virtual memory allocation.

Here's what I mean. Let's take a node with 128G and a 1G pagepool.

If I do the following to simulate what might happen as various jobs tweak the pagepool:

- tschpool 64G
- tschpool 1G
- tschpool 32G
- tschpool 1G
- tschpool 32G

I end up with this:

mmfsd thinks there's 32G resident but 64G virt
# ps -o vsz,rss,comm -p 24397
67589400 33723236 mmfsd

however, linux thinks there's ~100G used

# free -g
             total       used       free     shared    buffers     cached
Mem:           125        100         25          0          0          0
-/+ buffers/cache:         98         26
Swap:            7          0          7

I can jump back and forth between 1G and 32G *after* allocating 64G pagepool and the overall amount of memory in use doesn't balloon but I can't seem to shed that original 64G.

I don't understand what's going on... :) Any ideas? This is with Scale


