(Belatedly realized I hadn't sent this to the list server yesterday...)

OK, though I know I posted this before, I searched the archives to no avail. So, I dug this up, dusted it off, and did a *very* quick, light edit on it that might have introduced errors or missed things that have changed since (or not). I hope the formatting will survive (we'll see).

Nonetheless, the basic concepts are the same, so:

I have seemingly addressed this concern about once a month for over fifteen years. So...this is probably worth posting again. I've edited it for readability, but there are only minor updates. So if you have read it before, you probably don't need to read it again. Also, it has NOT been updated for modern disk drives or controllers.

I got a note in May 1997 from Marna, which read in part:

> Going back through my notes at the Packaging Council, I see that
> there was a question on whether 32760 was REALLY the best
> blocksize for RECFM=U datasets on 3380 and 3390 DASD.  I told
> them that it was, but they wanted more information.
>
> Of course, your name was brought up as the one that has gathered
> the most information on this subject!!  If you say it, they will
> believe it.
>
> Could you please put an append on the [IBM internal] forum to assure
> the council that this is an efficient blocksize for RECFM=U on
> 3380 and 3390? *

[tongue_in_cheek]
I'm always willing to accommodate a reasonable request, so here we go: 32760 is the best block size to use when allocating load libraries! OK? Glad we've got *that* out of the way...whew!
[/etongue_in_cheek]

OK, OK, for better or for worse, experience has shown that this
is a crowd that usually wants more detail in place of a simple
statement.  If you still have questions about this, read on, but
BE WARNED!  This will NOT be a short explanation.  And I'll also
go into block sizes for libraries other than load libraries.

I'll even try to make parts of this entertaining!  (Wish me
luck...)

------------------------------------------------------------------------

                      What's a Block, Anyway?

When doing I/O to a tape or DASD device, LRECL is irrelevant. Only the block size matters. This is because each physical record on tape and DASD is what we in software call a block. (This leads to interesting conversations sometimes between hardware and software people. "You gave me a block." "Nope, just one record.") So when we write to one of these devices, we only care about the characteristics of a block.

There are three kinds of blocks: Fixed, Variable, and Undefined.
When a fixed block is written, the physical record is always equal
to the block size except when there aren't enough records to fill an
entire block.  In this case, the last block can be a short block.
Variable blocks are written on a block-by-block basis, and each
block can be a different length.

The length of each variable block is stored in the physical record as
the BDW, or Block Descriptor Word, and when there are variable-length
records, there's a corresponding RDW, or...you guessed it...Record
Descriptor Word.  The BDW's length is left as an exercise for the Alert
Reader.  (Want a hint?  The maximum block length for data is 32760 in
MVS, not 32768.  The actual maximum length of a block itself is 32768,
and is limited by (at least) the specification of the block size in
DEB as a signed two-byte field.  The hardware limit is established
by the Count fields in both Format 0 and Format 1 CCWs, and wass 64K
when this was written originally.) **

                      Space and Block Length for FB

When fixed blocks are written to DASD, they start on a new track or
continue on a partly-used track only when there is enough space left
on the track to write the entire block.  This means that allocating FB
data sets with block sizes above half the track length is a guaranteed
way to waste lots of space.  Every time two full-size blocks follow
one another, the balance of the track will be unused and the second
full-size block written on the next track.

                      Space and Block Length for VB

The above is entirely true for FB, but it's a slight simplification
for VB.  For VB, the actual average block length, distribution of
block lengths, and order of the blocks will dictate whether space
utilization gets worse as the block size changes.  Since every new
PDS member starts a new block, if all the members are small a high
block size won't actually hurt anything.  But if some or all the
members are larger than 1/2 track, space utilization will get worse when the block size goes over half a track.

                         Little Blocks are Bad

On the other hand, short block sizes are bad because space is wasted
in between each record for Count and Key fields on CKD (Count, Key,
Data physical record formats) DASD, which is what we use in MVS.  To
avoid wasting space between the records, we want to use high block
sizes.  (This is a somewhat simplified view, ignoring cell sizes, etc.)

                  Bigger Blocks are Better...to a Point

A reasonable compromise (for FB and VB) between too-short blocks and
too-long blocks is half the track length, which minimizes the wastage
on average.  It's actually a bit more complicated than that (pick up a
DASD hardware book and a calculator for the gory details), but
DFSMSdfp's System Determined Blocksize, or SDB, takes care of the
complication and picks that value nearest half a track that's right
for the device and the block size specified.  More or less, anyway.

For most data sets, this comes very close to optimizing space usage
and performance.  Not perfect for every data set, mind you, but darn
close for the overwhelming majority, and close enough that trying to
write code to figure out *all* the intricacies would probably occupy
someone in STL for a lifetime (or two) and is probably light-years
from cost-justified.  (For some reason, those pesky programmers want
to get *paid*.  Sheesh!)

           Distribution of Member Sizes and Loading Order

However, the mix of block sizes and the order the members are loaded
in can make SDB less-than-optimum for some data sets.  (Remember,
too, that each PDS member starts a new block, there are short blocks
to think about, and some data sets are VB.)  The only consistent
exception I've seen, though, is fonts (as the former IBM Packaging
Rules owner said, "Fonts are *always* different"), and [the font
packager] has got the numbers to prove this to anyone who doubts
it. Unless you're really into pain, I suggest *not* asking her to give you the numbers. They'll give you a headache. Really.

                    Use SDB...Most of the Time

So, fonts aside, SDB is *really* likely to be the right block size
to tell customers to use when allocating *almost* anything but load
libraries.  There are other exceptions, too, like UADS, but none of
them are really software libraries.  If you're doing pretty standard
stuff, use SDB.  If you're doing something weird (and UADS is pretty
weird), check to see if one of your libraries is an exception to the
rule.

               But--Wasn't This About Load Libraries?

Oh, yeah; those things!  Load libraries containing load modules have
an undefined record format, RECFM=U.  And their blocks are also, well,
um, undefined.  They're written however the owner of the code that
writes them thinks they should be.

So there are no rules for Undefined blocks as a group.  And there are
data sets using RECFM=U that aren't load libraries.  I haven't talked
to the people that use such libraries, and have no idea what block
sizes might be optimum for them, individually or as a group.  (I'm
not sure I even *want* to know.)  Happily, nobody is shipping any of
these for system software, so I don't have to understand them--yet.

But I did pester the owners of IEBCOPY, the linkage editor, and Program
Fetch at some length about load module block sizes.  Several times.  I
think I even understand most of what they've told me now.  Sorta scary,
that, when I think about it.

                  Kinds of Load Module Records

Load modules (not Program Objects, which are stored only in PDSEs)
comprise a number of records each.  There are one or more ESD records,
which are used by Program Fetch to resolve external symbols.  There
are also RLD records, used to resolve relocatable address constants.
Then there are IDR records, and Control records.  These are all
typically short.  RLD and Control records are interspersed throughout
load modules, while ESD and IDR records are at the beginning of load
modules.

Then there are Text records, which make up the bulk of most load
modules.  These contain the executable code, funny-looking machine
language stuff.

         Maximum and Minimum Block Sizes for Text Records

When COPYMOD or the linkage editor writes a load module to a data set,
the allocation block size sets the *maximum* block size.  Short blocks
are always written for RLD, ESD, Control, and IDR records.  More to
the point, while writing Text records, RLD, and Control records, a
TRACKBAL macro is issued before writing each block to see how much
space is left on the track.  If there's enough space, a block is
written that's up to the remaining space on the track long, or as long
as the maximum block size, whichever is smaller.

There is also a minimum size block that will be written by either
utility.  These restrictions set the *minimum* size of a text block.
The minimum block size that the linkage editor and binder will try to
write for text records is 1024 bytes.

                     Writing Text Records

When the space left on the track is more than the minimum block length
(1024 bytes), but less than the maximum block length, and the text
left to be written is more than 2048 bytes long, the text can be
split.  What will fit on the track becomes the last block on the
track, and what won't fit on the track becomes the first part of the
first record, or the entire first record, on the next track.  This
process is repeated for each block until the end of the load module is
reached.  The next load module starts in a new block, right after
the block in which the previous one ended.

So COPYMOD and the linkage editor do their best to stuff every byte
that will fit onto every track.  Pretty neat, huh?  *Someone* was on
the ball when this code was written!

                 Performance and Space Utilization

How much high load library block sizes help out performance and space
usage depends on how long the load modules in the data set are.  For
example, the CSSLIB library is composed of small load modules, all of
which are currently 4K or smaller in size.  Increasing the block size
of this data set past 4K does no good at all.  But--neither does it
hurt.  The same blocks will be written in the same spots for any block
size greater than or equal to 4K.

On the other hand, this matters a great deal for, say, LINKLIB, which
has lots of big load modules.  It keeps getting better right up to the
32760 block size limit.  The same is true of lots of load libraries.
Since 32760 never hurts, and lower block sizes can, just recommending
32760 provides a single, consistent value customers can use that's
very often right, and *never* wrong.  (I originally qualified this
statement, but despite many challenges over more than a decade and
a half, nobody has found an exception yet.*)

                           So What?

But why does this matter?  After all, DASD is cheaper by the day (and,
Hey!, we sell that stuff, too, don't we?).

Well, OS/390 takes over four 3390-3 volumes now, and it's still not
shrinking.  I tested one data set in 1999, for DFSORT, and found a
20% reduction in space utilization when the library was blocked at
32760 vs. 6144.  20% is significant.

                    Program Fetch Performance

On "native" (non-emulated) DASD, another significant thing is the
corresponding 20% reduction in head switching (to read another track,
you've got to use another magnetic read head in the disk drive), which
in turn is a 1 1/3% reduction in seek time.  1 1/3% might not sound
like much, but a seek takes at least 1.5ms, which is a Long Time to a
computer.  And the head switches and seeks can take a *lot* longer
than 1.5ms.  Why, you ask?

Well, *since* you asked, Program Fetch tries to get a program off DASD
all at once.  It doesn't know how long the module is when it starts,
so it gets the first few records in the first shot.  They tell it if
there are more records, and later records can likewise tell it about
still more records to fetch.  Then, on the fly, it inserts CCWs into
the channel program to read each successive record.  It does this using
a Program-Controlled Interrupt (PCI) design.  If the processor is busy,
and the Fetch task isn't dispatched in time to insert the next CCW into
the channel program in time, the channel program ends prematurely.

This is a Bad Thing.  The disk won't wait, and keeps spinning.  By the
time the I/O is redriven, it's probably too late to catch the next
record without waiting...for...the...disk...to...turn...all...the...way...around. This takes 14ms on native 3390 DASD. This is 1,673 Dog Years to a computer.

Having to wait for the disk to turn around, taking its sweet time, is
called an RPS miss.  (No, not *that* RPS.***  No trucks are involved.
This RPS stands for Rotational Position Sensing.)  (Note: DASD control
unit cache reduces the probability of RPS misses for data on the same
track.)  We really hate it when this happens.  That low rumble you
hear is users griping about response time.

The probability of an RPS miss goes up with the number of records used
to write a load module.  Because COPYMOD and the linkage editor will
always write a record if they can, block sizes below 32760 just make
them write more records than they have to. So the performance improvements are even greater than the space utilization improvements, when you care the most about performance--that is, when the system is Really Busy.

When the system is lightly loaded, performance is only worsened by the
greater number of tracks to read, which is only a couple percent...but
Program Fetch gets used a *lot*.  What's a couple percent?  Customers
would sell you down the river in a heartbeat for a couple percent overall improvement on a 9021-9X2...that's as much processing power as an entire 9221-200 *has*. (They wouldn't get it from this alone, of course, but this could be a contributing factor in such an improvement.) Another, say, 150 TSO/E users could fit on the box at no extra cost for a paltry 2%.

What about newer DASD devices? Well, they do head-switching under the covers that is not apparent to the operating system, and there are probably mini-RPS misses happening that are handled by the control
unit microcode.  But these are not under our control.  It is still
true that larger block sizes lower the probability of missed PCI
interrupts and having to redrive the I/O.  So larger block sizes
still mean better performance.

* All this led to a number of changes in IBM product packaging (RELFILE block sizes and recommended allocation block sizes for load modules), IEBCOPY (PARM=SPCLCMOD), SMP/E (which now uses COPYMOD), and ServerPac (which now sets block sizes for you).
** The maximum block size for tape is now larger.  The Large Block
Interface (LBI) supports up to 256K blocks for tape.
*** RPS was a trucking company that merged with FedEx in 2001.



--
John Eells
z/OS Technical Marketing
IBM Poughkeepsie
[email protected]

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to