On Sun, 28 Oct 2007 08:41:16 am Marcy Cortes wrote:

> So, if I'm understanding right, those would be dirty pages no longer
> needed hanging out there in swap?

That's right -- but you'll get arguments on the definition of "no longer 
needed".  Having sent a page to the swap device, Linux will keep it out there 
even if the page gets swapped in.  The reason: if the page again needs to be 
swapped out, and it wasn't modified while it was swapped back in, you save an 
I/O (so the claim is that it's not that it's "no longer needed", it's that 
it's "not needed right now but might be again soon").

I read about this and other interesting behaviours at http://linux-mm.org -- 
it seems that the operation of Linux's memory management has generated enough 
discussion for someone to start a wiki on it. :)

The real issue in terms of VDISK is that even if we could eliminate the "keep 
it in case we need it" behaviour of Linux, there's no way for Linux to inform 
CP that a page of a VDISK is no longer needed and can be de-allocated.  Even 
doing swapon/swapoff, with an intervening mkswap, even chccwdev the thing off 
from Linux and back on again, won't tell CP that it can flush the disk -- 
AFAIK, only DELETE/DEFINE would do it.

> I thought the point of the priortized 
> swap was that it'd keep reusing those on the highest numbered disks
> before starting down to the next disk.  It was well into the 3rd disk
> (they are like 250M, 500M, 1G, 1G).   (at least I think it used to work
> that way!).  Could there be a linux bug here?

From what I've seen, Linux is working as designed unfortunately.  The 
hierarchy of swap devices was a theory (tested by others much more skilled 
and equipped than me, even though I drew the funny pictures of it in the 
ISP/ASP Redbook).  Regardless, it was only meant as an indicator for how big 
your *central storage* needs to be; as soon as the guest touched the second 
disk it was a flag to increase the central.  (Can't increase central?  Divide 
the workload across a number of guests.)  Ideally you *never* want to swap; 
having a swap device that's almost as fast as memory helps mitigate the cost 
of swapping, but using that fast swap is not a habit to keep up.

It's also quite possible that your smaller devices became fragmented and 
unable to satisfy a request for a large number of contiguous pages.  Such 
fragmentation would make it ever more likely that the later devices would get 
swapped-onto as your uptime wore on.

> Seems like vm.swappiness=0 (or a least a lower number than the default
> of 60) would be a good setting for Linux under VM. Has anyone studied
> this?

/proc/sys/vm/swappiness was introduced with kernel 2.6 [1].  The doco suggests 
that using swappiness=0 makes the kernel behave like it used to in the 2.4 
(and earlier) days -- sacrifice cache to reduce swapping.  I have seen SLES 9 
systems (with 2.6 kernels) appear to use far more memory than equivalent SLES 
8 systems (kernel 2.4), so from experience a low value is useful for the z/VM 
environment [2].

CMM is meant to be the remedy to all of this of course.  Now we can give all 
our Linux guests a central storage allocation beyond their wildest dreams 
(I'm kidding), and let VMRM handle the dirty work for us.  I could imagine 
that we could be a bit more relaxed about our vm.swappiness value then -- we 
still don't want each of our penguins to buffer up its disks, but perhaps the 
consequences aren't as severe when allocations are more fluid and more 
effective sharing is taking place[3].  Unfortunately I haven't used CMM in 
anger as I'm a little light on systems to play with nowadays.

Cheerio,
Vic Cross

[1] "Swappiness" controls the likelihood that a given page of memory will be 
retained as cache if the kernel needs memory -- it's a range from 100 (means 
cache pages are preserved and non-cache pages are swapped out to satisfy the 
request) to 0 (means cache pages are flushed to free memory to satisfy the 
request).
[2] If only to preserve the way that we used to tune our guests prior to 
2.6. :)
[3] We might even be able to do the Embedded Linux thing and disable swapping 
entirely!

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to