Rob van der Heij wrote:
One of the neat things about Diag250 is that it supports a fast-path when the data is in MDC (the I/O complete is returned immediately, skipping the entire process of reflecting an interrupt later). Linux supports that in that you also get a quick route back there. I've even done Linux multi-pathing on that to exploit it like PAV does. But I have been pushing this less lately after I was disappointed about some low-hanging fruit that z/VM was unable to pick...
Ah ! MDC ! Minidisk Caching is definitely a double-edged sword ! I'm quite certain (only intuitively) there are some environments that would benefit from disabling MDC altogether. Lemme explain.. Take a well known RDBMS engine (the one that starts with 'O'). Well... One thing is certain : It is the entity that will know best (hopefully) what pages to retain and what pages to flush right ? Now look at what we have : O<->z/Linux<->z/VM<->DSxxxx (and I'm not talking about what happens when you have an SVC in the middle !) Then you suddenly realize all these guys are implementing some form of caching. Some Read caching, some write-back caching and some write-through caching. Basically, your data may reside in 5 places : 4 cache, 1 permanent - maybe more - with little or no coordination between the various stages. That's a waste of resource to me ! Ideally - and when the application does its own caching - the only other caching left would be at the Disk Subsystem (since it is the only one that knows the actual physical layout and can re-order writes to achieve the best performance with respect to that physical layout). And without coordination, whatever caching will be performed by underlying layers is going to be through some heuristics that will not necessarily the actual usage need. Now take read-ahead.. The various layers may perform - each in turn - some read-ahead... for a series of pages or blocks when the application may be solely doing random page picks (and may never need 'Page+1' after having read 'Page') - and now we're talking waste of I/O channel bandwidth and increased latency if another I/O is needed : - O -> z/Linux : A want page #nnnn - z/Linux -> z/VM : Ok.. lemme get you page #nnnn (and I'll get you pages #nnnn+1 to #nnnn+15 while I'm at it) - z/VM -> DSxxxx : Ok.. you want pages #nnnn-#nnnn+15 - DSxxxx->Disks : Ok.. You want page #nnnn-#nnnn+15 - DSxxxx->z/VM : Got you page #nnnn - z/VM -> z/Linux : Got you page #nnnn - z/Linux -> O : Got you page #nnnn - O->z/Linux : Now get may page #mmmm - z/Linux : Err.. listen.. I'm not done getting pages #nnnn+1 to #nnnn+15 yet.. wait a bit please.. Now take another example : AIX One of the 1st things the 'O' people will tell you is that you should disable caching for those filesystems holding tablespaces, and permit 'O' to write directly with as little interference from the Operating System as possible.. So comes in the DIO and CIO (Direct I/O and Concurrent I/O) mount options for JFS2 filesystems. --Ivan ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
smime.p7s
Description: S/MIME Cryptographic Signature
