On Thu, Feb 19, 2009 at 11:47 AM, Ivan Warren <[email protected]> wrote: > Rob van der Heij wrote: >> >> One of the neat things about Diag250 is that it supports a fast-path >> when the data is in MDC (the I/O complete is returned immediately, >> skipping the entire process of reflecting an interrupt later). Linux >> supports that in that you also get a quick route back there. I've even >> done Linux multi-pathing on that to exploit it like PAV does. But I >> have been pushing this less lately after I was disappointed about some >> low-hanging fruit that z/VM was unable to pick... >> > Ah ! MDC ! > > Minidisk Caching is definitely a double-edged sword ! I'm quite certain > (only intuitively) there are some environments that would benefit from > disabling MDC altogether. > > Lemme explain.. > > Take a well known RDBMS engine (the one that starts with 'O'). Well... > One thing is certain : It is the entity that will know best (hopefully) > what pages to retain and what pages to flush right ? > Now look at what we have : > > O<->z/Linux<->z/VM<->DSxxxx (and I'm not talking about what happens when > you have an SVC in the middle !) > > Then you suddenly realize all these guys are implementing some form of > caching. Some Read caching, some write-back caching and some > write-through caching. Basically, your data may reside in 5 places : 4 > cache, 1 permanent - maybe more - with little or no coordination between > the various stages. That's a waste of resource to me ! Ideally - and > when the application does its own caching - the only other caching left > would be at the Disk Subsystem (since it is the only one that knows the > actual physical layout and can re-order writes to achieve the best > performance with respect to that physical layout).
Don't we all wish to be managed by only a single manager. But reality is that you have different agenda's. On a global level you know what resources are available for use, and on a local level you may know best how to use those resources. Unless the API is enhanced with hints and suggestions, heuristics is our best offer. > And without coordination, whatever caching will be performed by > underlying layers is going to be through some heuristics that will not > necessarily the actual usage need. The complication here is that you have several resources that are consumed in a sort of serialized manner. While it may appear silly to add more lanes to part of the route, it does make sense because not everyone follows the entire route in the same manner. The entire motivation behind PAV for example is that you provide multiple paths to the control unit cache, so an I/O that involves the back-end device does not also lock out a large part of the cache. Similar, while the real device is engaged in I/O, MDC can still be referenced when operations don't interfere. > Now take read-ahead.. The various layers may perform - each in turn - > some read-ahead... for a series of pages or blocks when the application > may be solely doing random page picks (and may never need 'Page+1' after > having read 'Page') - and now we're talking waste of I/O channel > bandwidth and increased latency if another I/O is needed : It's clear that track-based read-ahead provides limited value for random I/O. For MDC you use record level cache when the I/O is not likely sequential and/or track oriented, and some DASD subsystems will understand the bits in the channel program that MVS uses to indicate whether I/O is sequential and to what extent (no pun intended). Though the Linux dasd driver is known to abuse that mechanism, justified by unrealistic performance measurements... It really is a trade-off. You can often do things more efficient on the lowest level, but the highest level ideally knows best what makes most sense. Depending on home much more efficient and how good one can WAG, there may be multiple truths involved. Rob -- Rob van der Heij Velocity Software http://www.velocitysoftware.com/ ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
