Thanks Vic! JVM heap size and garbage collection seem to be under control. Believe me, this is well looked at by both us and IBM's finest since it is a huge 15 IFL at peak app :)... We've had our share there of these and badly written code before... The generational GC, new with 6.1, seems to be a *phenomonal* difference (and I don't say that lightly never believing that perf knobs at the app level save you much of anything at, saving 13% in CPU and shaving 100ms off of response times (on a 500ms response time transaction). Unbelievable... So I gotta believe they are close to as good as it gets for your average programmer... Course there's always the outlier/different transaction that could be coming in and gumming up the whole system.. ... And of course WAS support says all of their leaks are fixed now :)) ---- (and there is some significant ones there apparently in fixpacks less than the current 11 if you study WAS 6. support site:).. They are saying this is a "native memory" leak, not in the JVM heap, so tracing that is needed is totally invasive and therefore nearly impossible in our env. And that there is a possibility that it will *stablize* at maybe 3-5 Gig thus telling us what the virtual memory size should be.... (hard to believe when it was so happy, even overcommitted and probably needing 1.3G, on WAS 5/sles8 at 1.5 Gig, but you know 64bit is bigger :) ) We're leaving some up with larger swap sizes now pending the "stabilization" or near crash, whichever comes first.
Swapoff does appear to cause some long pauses, so can't do that in production :( Can't afford to lose even 1 second because that results in ATMs not reaching back end systems... We recycle weekly anyway for DR reasons...so for now... We just need to make it 7 days without loss of response time. It probably does make more sense to keep adding vdisks & vm paging volumes rather than dedicated disk for swapping. At least they'll all share that way ( clustered app with a few servers on each lpar)... Now on the otherhand.. Our test environment with probably 35 out of 100 running WAS6 it becomes not an aberration but the norm for the load we have there unfortunately. Luckily the paging system is so robust (I think we hit 20K per sec to DASD in todays Monday morning fun). More experimention is definitely needed there! Marcy Cortes "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -----Original Message----- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Vic Cross Sent: Monday, October 29, 2007 7:29 PM To: [email protected] Subject: Re: [LINUX-390] Swap oddities On Tue, 30 Oct 2007 06:19:33 am Marcy Cortes wrote: > I'm not sure it is working as designed. I never said it was a good design -- and perhaps I should have read your earlier messages prior to saying that. :) It does depend on your point of view though -- it's another one of these aspects that belies Linux's single-system non-resource-sharing heritage. In a non-shared environment, keeping swap pages hanging around on disk is a good design point in that it can realistically save costly I/O. It's not so good for us though. :) > Eventually, when we use up our > swap, WAS crashes OOM (that's *our* real issue, at least our biggest > one anyway :). Yes... and that's not going to be solved by CMM or creating different swap VDISKs or anything like that. The earlier hints about JVM heap size and garbage collection and so on will be useful here. I guess the application is being checked for leaks as well -- or do your developers write perfect code first-time-every-time too? ;-P > But if we are able to swapoff/swapon and recover that space without > crashing WAS that kind a says to me that it didn't need it anyway - > course I haven't tried that whilst workload was running through... > Maybe it is destructive. It might be, but as long as your Linux has more free virtual memory than the amount of pages in use on the device you want to remove, you *should* be able to do a swapoff without impact (things might get a little sluggish for a few seconds while kswapd shuffles things around though). It would be nice to be able to tell accurately just how much swap space is being used on a device -- /proc/meminfo is system-wide. SwapCached in /proc/meminfo is a helpful indicator that counts the swap space "hanging around" (you could try http://www.linuxweblog.com/meminfo among heaps of other places for more info about what the numbers from meminfo mean); if this number is low compared to your total available swap then you're not likely to get much benefit from swapoff/swapon cycles. > We plan to experiment some with the vm.swapiness and see if that helps. > I guess in the very least, we can add enough vdisks and enough VM > paging packs to get through week without a recycle until we figure > this out as long as response time & cpu savings remain this good with 6.1. Good plan, although vm.swappiness is only likely to delay your swap usage rather than eliminate it entirely (if something is asking for that much memory, at some point it's going to have to get it from somewhere). Of course If it delays heavy swapping long enough to get you through the week then that's a win. While you've got this WAS issue you are *possibly* justified in throwing a DASD swap device at the end of your line of VDISKs (I emphasise possibly because I don't want to offend Rob et al too much). Perhaps the last thing you want would be to just keep adding VDISKs and VM page packs until your VM paging system is consumed by leaked Linux memory. You could do a nightly swapoff/swapon of some of the VDISKs to flush things out and reduce the activity to the DASD swap. I guess what I'm saying is that you could think about this WAS problem as an abberation rather than the normal operating mode for your system -- don't jeopardise your entire environment for the sake of one problem system, and be prepared to let best-practice slide a bit while you get the issue sorted. Of course you're in a much better position than me to decide if your paging environment needs such protection. I also transposed my client's problem onto your shop -- I thought you were concerned about the number of pages allocated to VDISKs. That's why I mentioned the stuff about DELETE/DEFINE of your VDISK swaps. Best of luck with the issue! Cheerio, Vic Cross -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
