Re: Why does one need to mkinitrd/zipl ? (WAS : Broken logical volume group)

Ivan Warren Thu, 19 Feb 2009 02:48:40 -0800

Rob van der Heij wrote:

One of the neat things about Diag250 is that it supports a fast-path
when the data is in MDC (the I/O complete is returned immediately,
skipping the entire process of reflecting an interrupt later). Linux
supports that in that you also get a quick route back there. I've even
done Linux multi-pathing on that to exploit it like PAV does. But I
have been pushing this less lately after I was disappointed about some
low-hanging fruit that z/VM was unable to pick...

Ah ! MDC !

Minidisk Caching is definitely a double-edged sword ! I'm quite certain
(only intuitively) there are some environments that would benefit from
disabling MDC altogether.

Lemme explain..

Take a well known RDBMS engine (the one that starts with 'O'). Well...
One thing is certain : It is the entity that will know best (hopefully)
what pages to retain and what pages to flush right ?
Now look at what we have :

O<->z/Linux<->z/VM<->DSxxxx (and I'm not talking about what happens when
you have an SVC in the middle !)

Then you suddenly realize all these guys are implementing some form of
caching. Some Read caching, some write-back caching and some
write-through caching. Basically, your data may reside in 5 places : 4
cache, 1 permanent - maybe more - with little or no coordination between
the various stages. That's a waste of resource to me ! Ideally - and
when the application does its own caching - the only other caching left
would be at the Disk Subsystem (since it is the only one that knows the
actual physical layout and can re-order writes to achieve the best
performance with respect to that physical layout).

And without coordination, whatever caching will be performed by
underlying layers is going to be through some heuristics that will not
necessarily the actual usage need.

Now take read-ahead.. The various layers may perform - each in turn -
some read-ahead... for a series of pages or blocks when the application
may be solely doing random page picks (and may never need 'Page+1' after
having read 'Page') - and now we're talking waste of I/O channel
bandwidth and increased latency if another I/O is needed :

- O -> z/Linux : A want page #nnnn
- z/Linux -> z/VM : Ok.. lemme get you page #nnnn (and I'll get you
pages #nnnn+1 to #nnnn+15 while I'm at it)
- z/VM -> DSxxxx : Ok.. you want pages #nnnn-#nnnn+15
- DSxxxx->Disks : Ok.. You want page #nnnn-#nnnn+15
- DSxxxx->z/VM : Got you page #nnnn
- z/VM -> z/Linux : Got you page #nnnn
- z/Linux -> O : Got you page #nnnn
- O->z/Linux : Now get may page #mmmm
- z/Linux : Err.. listen.. I'm not done getting pages #nnnn+1 to
#nnnn+15 yet.. wait a bit please..

Now take another example : AIX

One of the 1st things the 'O' people will tell you is that you should
disable caching for those filesystems holding tablespaces, and permit
'O' to write directly with as little interference from the Operating
System as possible.. So comes in the DIO and CIO (Direct I/O and
Concurrent I/O) mount options for JFS2 filesystems.

--Ivan

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Why does one need to mkinitrd/zipl ? (WAS : Broken logical volume group)

Reply via email to