Re: 3380-3390 Conversion - DISAPPOINTMENT

John Eells Fri, 24 Mar 2006 17:13:11 -0800

****************** Very long post warning *****************

I thought I'd posted this before, but didn't see it in thearchives. This was originally written as a post to IBM'sinternal software packaging forum on the topic of block sizes torecommend for different system software data sets.

It's a bit dated now (written when SLED was the order of the dayand RAID was new) but I think most of it still applies. Onething it doesn't include that I should have mentioned back thenis the overhead of simply needing more CCWs to get the job done;this still applies even though some of the physical delays nolonger do. But time to think about those updates and make thembelow is gone for this week, so...here 'tis, as it is.

Also, thanks to Darren for his help in allowing this extremelylong post.


                      What's a Block, Anyway?

When doing I/O to a tape or DASD device, LRECL is irrelevant.Only the block size matters. This is because each physicalrecord on tape and DASD is what we in software call a block.(This leads to interesting conversations sometimes betweenhardware and software people. "You gave me a block." "Nope,just one record.") So when we write to one of these devices, weonly care about the characteristics of a block.

There are three kinds of blocks: Fixed, Variable, and Undefined.When a fixed block is written, the physical record is alwaysequal to the block size except when there aren't enough recordsto fill an entire block. In this case, the last block can be ashort block. Variable blocks are written on a block-by-blockbasis, and each block can be a different length.

The length of each variable block is stored in the physicalrecord as the BDW, or Block Descriptor Word, and when there arevariable-length records, there's a corresponding RDW, or...youguessed it...Record Descriptor Word. The BDW's length is left asan exercise for the Alert Reader. (Want a hint? The maximumblock length for data is 32760 in MVS, not 32768. The actualmaximum length of a block itself is 32768, and is limited by (atleast) the specification of the block size in DEB as a signedtwo-byte field. The hardware limit is established by the Countfields in both Format 0 and Format 1 CCWs, and is 64K.)


                      Space and Block Length for FB

When fixed blocks are written to DASD, they start on a new trackor continue on a partly-used track only when there is enoughspace left on the track to write the entire block. This meansthat allocating FB data sets with block sizes above half thetrack length is a guaranteed way to waste lots of space. Everytime two full-size blocks follow one another, the balance of thetrack will be unused and the second full-size block written onthe next track.


                      Space and Block Length for VB

The above is entirely true for FB, but it's a slightsimplification for VB. For VB, the actual average block lengthwill dictate whether space utilization gets worse as the blocksize rises. This will be a function of the size of the membersand the distribution of differently-sized members within eachdata set. Since every new PDS member starts a new block, if allthe members are small a high block size won't actually hurtanything. But if some or all the members are larger than 1/2track, space utilization will get worse when the block size goesover half a track.


                         Little Blocks are Bad

On the other hand, short block sizes are bad because space iswasted in between each record for Count and Key fields on CKD(Count, Key, and Data physical record formats) DASD, which iswhat we use in MVS. To avoid wasting space between the records,we want to use high block sizes.


                  Bigger Blocks are Better...to a Point

A reasonable compromise (for FB and VB) between too-short blocksand too-long blocks is half the track length, which minimizes thewastage on average. It's actually a bit more complicated thanthat (pick up a DASD hardware book and a calculator for the gorydetails), but DFSMSdfp's System Determined Blocksize, or SDB,takes care of the complication and picks that value nearest halfa track that's right for the device and the block size specified.More or less, anyway.

For most data sets, this comes very close to optimizing spaceusage and performance. Not perfect for every data set, mind you,but darn close for the overwhelming majority, and close enoughthat trying to write code to figure out *all* the intricacieswould probably occupy someone in SVL for a lifetime (or two) andis probably light-years from cost-justified. (For some reason,those pesky programmers want to get *paid*. Sheesh!)


           Distribution of Member Sizes and Loading Order

However, the mix of block sizes and the order the members areloaded in can make SDB less-than-optimum for some data sets.(Remember, too, that each PDS member starts a new block, thereare short blocks to think about, and some data sets are VB.) Theonly consistent exception I've seen, though, is fonts (as [thethen-owner of the packaging rules] says, "Fonts are *always*different"), and [the then-owner of the font FMIDs]'s got thenumbers to prove this to anyone who doubts it. Unless you'rereally into pain, I suggest *not* asking [her] to give you thenumbers. They'll give you a headache. Really.


                    Use SDB...Most of the Time

So, fonts aside, SDB is *really* likely to be the right blocksize to tell customers to use when allocating *almost* anythingbut load libraries. There are other exceptions, too, like UADS,but none of them are really software libraries. If you're doingpretty standard stuff, use SDB. If you're doing something weird(and UADS is pretty weird), check to see if one of your librariesis an exception to the rule.


               But--Wasn't This About Load Libraries?

Oh, yeah; those things! Load libraries containing load moduleshave an undefined record format, RECFM=U. And their blocks arealso, well, um, undefined. They're written however the owner ofthe code that writes them thinks they should be.

So there are no rules for Undefined blocks as a group. And thereare data sets using RECFM=U that aren't load libraries. Ihaven't talked to the people that use such libraries, and have noidea what block sizes might be optimum for them, individually oras a group. (I'm not sure I even *want* to know.) Happily,nobody is shipping any of these for system software, so I don'thave to understand them--yet.

But I did pester the owners of IEBCOPY, the linkage editor, andProgram Fetch at some length about load module block sizes.Several times. I think I even understand most of what they'vetold me now. Sorta scary, that, when I think about it.


                  Kinds of Load Module Records

Load modules (not Program Objects, which are stored only inPDSEs) comprise a number of records each. There are one or moreESD records, which are used by Program Fetch to resolve externalsymbols. There are also RLD records, used to resolve relocatableaddress constants. Then there are IDR records, and Controlrecords. These are all typically short. RLD and Control recordsare interspersed throughout load modules, while ESD and IDRrecords are at the beginning of load modules.


Then there are Text records, which make up the bulk of most load

modules. These contain the executable code, funny-lookingmachine language stuff.


         Maximum and Minimum Block Sizes for Text Records

When COPYMOD or the linkage editor writes a load module to a dataset, the allocation block size sets the *maximum* block size.Short blocks are always written for RLD, ESD, Control, and IDRrecords. More to the point, while writing Text records, RLD, andControl records, a TRACKBAL macro is issued before writing eachblock to see how much space is left on the track. If there'senough space, a block is written that's up to the remaining spaceon the track long, or as long as the maximum block size,whichever is smaller.


There is also a minimum size block that will be written by either

utility. These restrictions set the *minimum* size of a textblock. The minimum block size that the linkage editor and binderwill try to write for text records is 1024 bytes.


                     Writing Text Records

When the space left on the track is more than the minimum blocklength (1024 bytes), but less than the maximum block length, andthe text left to be written is more than 2048 bytes long, thetext can be split. What will fit on the track becomes the lastblock of the track, and what won't fit on the track becomes thefirst part of the first record, or the entire first record, onthe next track. This process is repeated for each block untilthe end of the load module is reached. The next load modulestarts in a new block, right after the block in which theprevious one ended.

So COPYMOD and the linkage editor do their best to stuff everybyte that will fit onto every track. Pretty neat, huh? *Someone*was on the ball when this code was written!


                 Performance and Space Utilization

How much high load library block sizes help out performance andspace usage depends on how long the load modules in the data setare. For example, the CSSLIB library is composed of small loadmodules, all of which are currently 4K or smaller in size.Increasing the block size of this data set past 4K does no goodat all. But--neither does it hurt. The same blocks will bewritten in the same spots for any block size greater than orequal to 4K.

On the other hand, this matters a great deal for, say, LINKLIB,which has lots of big load modules. It keeps getting betterright up to the 32760 block size limit. The same is true of lotsof load libraries. Since 32760 never hurts, and lower blocksizes can, just recommending 32760 provides a single, consistentvalue customers can use that's very often right, and *never*wrong. (I originally qualified this statement, but despite manychallenges over a 5-year period, nobody has found an exception yet.)


                           So What?

But why does this matter? After all, DASD is cheaper by the day(and, Hey!, we sell that stuff, too, don't we?).

Well, OS/390 takes over four 3390-3 volumes now, and it's stillnot shrinking. I tested one data set in 1999, for DFSORT, andfound a 20% reduction in space utilization when the library wasblocked at 32760 vs. 6144.


20% is significant.

                    Program Fetch Performance

On "native" (non-emulated) DASD, another significant thing is the

corresponding 20% reduction in head switching (to read anothertrack, you've got to use another magnetic read head in the diskdrive), which in turn is a 1 1/3% reduction in seek time. 1 1/3%might not sound like much, but a seek takes at least 1.5ms, whichis a Long Time to a computer. And the head switches and seekscan take a *lot* longer than 1.5ms. Why, you ask?

Well, *since* you asked, Program Fetch tries to get a program offDASD all at once. It doesn't know how long the module is when itstarts, so it gets the first few records in the first shot. Theytell it if there are more records, and later records can likewisetell it about still more records to fetch. Then, on the fly, itinserts CCWs into the channel program to read each successiverecord. It does this using a Program-Controlled Interrupt (PCI)design. If the processor is busy, and the Fetch task isn'tdispatched in time to insert the next CCW into the channelprogram in time, the channel program ends prematurely.

This is a Bad Thing. The disk won't wait, and keeps spinning. Bythe time the I/O is redriven, it's probably too late to catch thenext record withoutwaiting...for...the...disk...to...turn...all...the...way...around.This takes 14ms on native 3390 DASD. This is 1,673 Dog Yearsto a computer.

Having to wait for the disk to turn around, taking its sweettime, is called an RPS miss. (No, not *that* RPS. No trucks areinvolved. This RPS stands for Rotational Position Sensing.)(Note: DASD control unit cache reduces the probability of RPSmisses for data on the same track.) We really hate it when thishappens. That low rumble you hear is users grumbling aboutresponse time.

The probability of an RPS miss goes up with the number of recordsused to write a load module. Because COPYMOD and the linkageeditor will always write a record if they can, block sizes below32760 just make them write more records than they have to. Sothe performance improvements are even greater than the spaceutilization improvements, when you care the most aboutperformance--that is, when the system is Really Busy.

When the system is lightly loaded, performance is only worsenedby the greater number of tracks to read, which is only a couplepercent...but Program Fetch gets used a *lot*.

What about newer DASD devices? Well, they do head-switchingunder the covers that is not apparent to the operating system,and there are probably mini-RPS misses happening that are handledby the control unit microcode. But these are not under ourcontrol. It is still true that larger block sizes lower theprobability of missed PCI interrupts and having to redrive theI/O. So larger block sizes still mean better performance.


--
John Eells
z/OS Technical Marketing
IBM Poughkeepsie
[EMAIL PROTECTED]

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: 3380-3390 Conversion - DISAPPOINTMENT

Reply via email to