On May 13, 2005, at 9:17 PM, Charles Lockhart wrote:
Brian Chee wrote:
Actually I have a question...why would you want to run a machine
without
swap? There are good reasons if you're running an embedded linux
machine,
but for normal machines I've seen folks setup unix boxes that boot
from the
network but ONLY do swap to the local hard disk. Those old xterms did
this
alot just so that you don't swap over the network.
We're not using our machines as general desktop platforms. They're
part of a system. In the current case I'm looking at, the computer is
receiving data via a fiber link on one pci bus, processing that data,
and then writing the data to disk across a second pci bus while we're
still reading in more data over the first pci bus. We've (hopefully)
managed to handle the contention issues internal to the program, but
we found that the system loses balance and starts dropping data
(irreplacable data) somewhat randomly if physical memory fills up and
the mm starts using the swap partition. Basically the primary
application will be working fine, then we'll start up something other
stuff (slickedit, firefox, tkcvs, etc.), and at some point the swap
partition starts getting used and the primary application performance
starts being randomly flaky. This would be fine if we had some big
shiny flag that would shoot up and alert the user that the system
needs to be re-balanced. But we don't.
One way that we could possibly fix this is to just disable the swap
partition. I'd been hoping that new applications that would exceed
the physical memory on process load would just fail, flagging to the
user that they're misbehaving, but instead the machine just slows down
a lot. This is slightly more problematic for how we use the system.
I've also talked to other people that were designing instrumentation
for astronomy, and there interest in getting rid of the drives was
based on what I'm told is a high rate of disk failure at altitude. If
the primary source of failure is the disk, then why have it? But,
please somebody correct me if I'm wrong, no disk no swap space?
Man, you gotta love all the white-out on that black project tech. :-)
Not all hard drives have disks, it turns out. For (ahem) mil-spec
(did I say that) applications at altitude (most folks don't know just
how hard an airplane can vibrate under certain conditions. This makes
the heads hit the disks, which is... bad), some of the (cough)
contractors with which I am familiar use various (battery backed) RAM
or flash based disks.
Companies like BiTMICRO Networks, M-Systems and Texas Memory Systems
all make IDE (or SCSI) based drives that are completely solid state.
Might be worth thinking about, not so much for swap, but you're writing
down that data *someplace* inside the airframe, no?
As for altitude without the (cough) "mobile platform" aspect (so no
high cyclic rate, high-G vibrations), consumer grade drives can't
stomach high altitude quite literally because the air is (too) thin, so
the heads fly on a thinner cushion of air, and this is a bit too close.
Its a bummer.
As for keeping your application running while the VM system tries to be
fair to firefox or slickedit:
1) which kernel are you running?
2) have you considered mlock()/mlockall()?
(post 2.6.9 you don't have to be (e)uid==0 to successfully call
mlock() and friends)
3) you might also look at sched_setscheduler() and friends (assuming
linux 2.6 kernels)
4) for the truly time-critical application, you could look at RTLinux
and friends.
#2 and #3 can probably be combined to ensure that its firefox, tkcvs
and slickedit that get sick in
a low free pages situation. You'll want to be careful to not livelock
yourself, create priority inversion, etc.
You could also call setrlimit(RLIMIT_AS, ...) in the parent of any
process that is likely to start firefox, tkcvs, etc. This will smack
any program that attempts to allocate over the limit you set. If you
want to be somewhat kinder, you could setrlimit(RLIMIT_RSS, ...) to
limit the resident set of these "unclean" programs. That will make
the pager work harder, but if you've a) locked the critical
pages/applications in core and b) told everyone else that can only have
X MB (each) of resident pages, you might find a solution.
They're just ideas.
jim