Long response here. Shared disk is the way to go!!
On Thu, 27 Jul 2006, MOEUR TIM C wrote:
> I'm pursuing an architecture for multiple guests under VM
> and I'd like to know if anyone else has done the same,
> or if this is just an accident waiting to happen. ...
Yes, done the same here. Accident affinity? Naaahhh!
Shared filesystems, by way of z/VM shared (linked) disks,
is something I've done for years. I strongly recommend it.
YOU WILL NOT get support from S/W vendors who "don't get it".
You will have to lean on some. But you are the customer, so lean!
Virtualization is here *now*, not some futuristic idea.
Sharing disks is more than just SAN.
> Here's what I'm considering: I'd like to create multiple VM Linux
> guests that each have read access to a set of common minidisks.
> On those common minidisks will be what I'm calling the shared Linux
> file systems, such as stuff in /sbin, /bin, /boot, /lib. Each VM
> Linux guest will also have an exclusive minidisk (WRITE) that will
> contain the> file systems needed to update and operate (/etc, /proc,
> /sys, /tmp and so on). The assumption is that each Linux guest will
> use the same level of OS, patches, and add-on programs.
Sounds fine.
And REQUIRING each guest Linux to be at the same patch level
is a Good Thing, reduces maint headaches, and certainly fosters
your objective of sharing common op sys content.
/sys and /proc are creatures of the kernel. Don't worry about them.
/tmp can be private space or can be "tmpfs" (which is memory).
You probably never want to share /tmp. If you did, go NFS, not disk.
> I have two goals with this idea -
>
> 1) to limit maintenance procedures. I'd only have to apply patches to
> one image and all of the Linux guests would be affected.
> 2) conserve DASD.
#1 is where you will see real value.
#2 is like buying a hybrid or electric car:
Sure, you'll save money, but not AS MUCH as you'll be
reaping other benefits of your superiour engineering.
[I have deleted your "reasons not to do this" for brevity
because I'm too long winded on this subject even without it.]
Sir Santa Sez: Share Storage, Save Sanity
We do virtualization (henceforth called "V12N") in order to share
resources. (We also do it to get that lovely isolation where one guest
cannot clobber another. But we could get that effect from discrete boxes,
so we're back to sharing as the main reason for V12N.) If one of the
"resources" to be shared is the systems programming staff, then we
don't want to lose that advantage by having the sys progs hit each
virtual penguin every time a patch comes out. At my shop, we have
several hundred tuxes. THEY DO NOT all get patched. It just
doesn't happen. Too much work. Too much risk!
Now ... we really do need to talk about RPM and how it
does not grok shared disk, especially not read-only disk.
The solution is blindingly simple: Don't use RPM. That is,
don't run RPM on the penguins which share the R/O disks.
Limit RPM to some kind of master system. When that is ready to go,
let the others have at the (new) disks. I'm leaving out some detail.
Reading Robert Nix's note, I get a strong sense of RPM pain.
There will be some "reconciliation" to do, maybe less than you'd think.
NOTE: Solaris has shared certain filesystems among multiple Suns
for almost two decades. In particular, they can easily share /usr
across many "client" systems. Software maint is simplified,
but yes, there has to be a way to reconcile things which fall
outside of /usr or whatever filesystems are shared. This is not new!
There will always be one penguin that wants special attention,
requires some app that the rest don't need. That's a "one off".
You will need to figure out how to fix those. But if you know
ahead of time that they're coming, life gets easier.
*** SHARED /usr AND /opt WITH PRIVATE ROOT ***
Like Dominic said, you don't have to share /usr to have it be
mounted read-only. Many people recommend (we're talking discrete
systems now) that /boot be read-only, if only because people
should not be writing to it. The same goes for /usr: If it is
on its own volume get in the habit of defining it R/O. For maint,
just remount it R/W on the fly. This is trivial.
But I'm getting away from the point: Let multiple Linuxen on VM
have a link to the one minidisk where a common /usr resides.
That will be read-only. And that's a good thing.
USER LINUX ...
... other stuff ...
LINK MAINT 1B0 1B0 RR
MDISK 1B1 3390 ... ... volser MR
As an example, the 1B0 disk is bootable and has three partitions:
/boot, /usr, and /opt. (The DASD driver only supports up to three
partitions. The boot disk must be partitioned to save room for the
IPL text in the first track.) The 1B1 disk a copy of MAINT 1B1,
with the root partition and maybe /var if you choose to split it out.
Come time for the reconciliation (after patching), you'll want to
LINK MAINT 1B1 to some available address on the "clients" and copy any
changed files. This may sound like trouble, but it's not so bad.
(And below I'll discuss read-only root which is even mo betta!)
To patch, sign on to MAINT (or whever you want to do your work).
USER MAINT ...
... other stuff ...
MDISK 1B0 3390 ... ... volser RR
MDISK 1B1 3390 ... ... volser RR
MDISK 4B0 3390 ... ... volser MR
MDISK 4B1 3390 ... ... volser MR
What you want to do is,
cp detach 1b0 1b1
cp link * 4b0 1b0 mr
cp link * 4b1 1b1 mr
THEN IPL Linux to apply your patches. You no doubt have
your own requirements for stability and acceptance. Once you're happy,
shut it down, detach and SWAP THE MINIDISKS. Note what Ed MacK said
about R/W on one v-machine with R/O by others. Be sure to swizzle
your minidisks carefully. You might want to have three or more sets.
You must have AT LEAST TWO sets.
You will have to reconcile /bin, /lib (and /lib64), and /sbin
when using a private root disk. Shared root relieves that hassle.
You will have to reconcile /etc and /var in any case.
*** SHARED /usr AND /opt AND READ-ONLY SHARED ROOT ***
Here's a neat trick from z/OS: Mount the root FS read only.
Actually, they're not the only ones to do that, and they're
certainly not the first to do it. But it's a stretch for your
traditional Unix sysadmins when they first see it. Freaks them out!
Just another reason for "open systems" people to think that
the mainframe is weird.
On Linux, with a shared read-only root, you have much less
content to resolve when updates are made. MUCH less.
USER LINUX ...
... other stuff ...
LINK MAINT 1B0 1B0 RR
LINK MAINT 1B1 1B1 RR
MDISK 1B2 3390 ... ... volser MR
The trick to making this work is to disable "boot.rootfsck"
(for SuSE) and replace it with something that mounts
your local private content EARLY IN THE BOOT PROCESS.
At my shop, we're working on a "boot.readonlyroot" for this.
(Dunno how to do this in RedHat. Have not checked lately.)
The magic of a read-only root is to have a small private R/W filesystem
checked and mounted very soon, with /etc and other unique-to-each-host
content physically residing there.
Use a "bind mount" to get /etc fudged into your private space.
The /etc directory must be read-write. No clean way around that,
if only because of "mtab". But /etc is also unique to each guest.
You'll also want to bind mount /root to the private volume.
All of these bind mount suggestions could be done with regular mounts
or with sym-links. We've been using bind mounts because it means
less variance in production from how the distributors lay things out.
Typical "mount point" directories include /home, /cdrom,
of course /mnt. /media is usually handled by the automounter.
These are all just empty directories, whether root is private
or shared read-only. (Well, some people populate /home without
thinking ahead. But it's easily split out into another vol or auto.)
*** SUMMARY ***
Do it! Share the disks. Deal with RPM. Deal with the "one offs".
If you set your hand to the plough, do not look back. You will
be glad of it and will enjoy the harvest.
When doing RPM, separate your maint system from the golden runnables.
For the one-offs, I can suggest a solution, or you can invent one.
There are just too many ways to skin that cat.
root -- put on xx1 disk, R/O or private
/bin -- part of root, whichever you choose
/lib -- "
/lib64 -- "
/sbin -- "
/dev -- "
/tmp -- memory (tmpfs) or a disk
/boot -- easy R/O, put on bootable xx0
/usr -- "
/opt -- "
/media -- automounter fodder
/home -- automounter, NFS, a disk, whatever
/var -- bind to local R/W if root is R/O, or disk
/etc -- bind to local R/W if root is R/O
/root -- "
/proc -- kernel stuff, no worries
/sys -- "
There are a bazillion ways you can take this!
You could put /usr on its own disk, maybe even unpartitioned,
say 1BE which is akin to the CMS 19E. You could split out /usr/man
to a 1BD disk following the same (see CMS 19D). If that buys you
synergy in your maintenance, do it. YOU must decide where you
want to put the shared stuff, how you will lay it out.
-- R;
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390