Re: Why sort (was Microprocessor Optimization Primer)

David Crayford Thu, 07 Apr 2016 04:47:32 -0700

On 7/04/2016 6:59 PM, Wayne Bickerdike wrote:

I'm slightly gobsmacked that this discussion is needed. I guess the forest
is lost in the trees.


I can recommend "Principles of Program Design" by Michael Jackson c. 1975.

Of greater concern is the implication that Oracle on AIX outperforms DB2 on
z/OS at our shop. Surely not :(


Do you have real workload benchmarks that prove it Wayne?

On Thu, Apr 7, 2016 at 2:59 PM, Joel C. Ewing <[email protected]> wrote:

On 04/06/2016 07:01 AM, Andrew Rowley wrote:

On 05/04/2016 01:20 AM, Tom Marchant wrote:

On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:

A Hashmap potentially allows you to read sequentially and match records
between files, without caring about the order.

Can you please explain what you mean by this? Are you talking about
using
the hashmap to determine which record to read next, and so to read the
records in an order that is logically sequential, but physically
random? If so,
that is not at all like reading the records sequentially.

If one file fits in memory, you can read it sequentially into a
Hashmap with the using the data you want to match as the key.
Then read the second one, also sequentially, retrieving matching
records from the Hashmap by key. You can also remove them from the
Hashmap as they are found if you need to know if any are unmatched.

But this is a solution for a made up case - I don't know whether it is
a common situation. I was interested in hearing real reasons why sort
is so common on z/OS i.e. Why sort?

On Hashmaps etc. in general - they are the memory equivalent to
indexed datasets (VSAM etc) versus sequential datasets. Their
availability opens up many new ways to process data - and algorithm
changes are often where the big savings can be made.

I believe others have already alluded to the potential time advantage of
processing a large number of updates in key order rather than randomly
when external data is indexed but actually physically ordered by some
key.  The reason why this has historically been the case is that
external disk storage devices which allow random access have
rotational-latency delay and access-head-positioning delay which is
minimized when doing full-track or even multi-track I/O and when
accessing adjacent cylinders.  The way to update the data in minimal
real time is to do the I/O in minimal disk rotations, accessing all data
needed on the same track in one rotation and all data in one cylinder
before moving to an adjacent cylinder. Crucial to this concept is
understanding that z/OS includes support within I/O access methods which
allows applications to successfully exploit the ability of DASD hardware
to transfer one, several, or all data blocks on a track as a single
operation within a single disk revolution.

With emulated DASD and hardware DASD caching, the effects of physical
track and cylinder boundaries may be unknownl, but it is still likely
that minimizing repeated visitations to an emulated track  or an
emulated cylinder will achieve similar locality of reference on physical
DASD, reduce latency delays and improve the effectiveness of hardware
caching.  Processing transaction records in the same order as the
database records are physically stored on an external file gives the
best odds of grouping transactions needing the same track and cylinder
together and for minimizing I/O delays and minimizing demands on DASD
cache storage and processor storage for file buffers.  Processing
transactions in a different order increases the likelihood that the
needed file data to process the transaction is no longer in processor
memory or disk cache and that at a minimum the time equivalent of
another disk revolution  will be required to obtain it.

It was not uncommon with VSAM files for transaction sorting to improve
real-time processing speed sufficiently that the break-even point even
with sorting overhead could be as low as updating only 5% of the
database.  These techniques were common in MVS and its z/OS successor
applications because it was common for those systems to deal with very
large files and databases where tricks like this were necessary in order
to meet constrained nightly batch processing windows..  Since it is
common in z/OS to be dealing with very large files and databases, there
are always files in those environments that are too large to consider
placing the entire file in memory, no matter how large processor memory
becomes.

Hash maps are not really equivalent to VSAM data sets because a VSAM
file is not just indexed, but indexed-sequential, which means once you
have successfully stored records in the file, reading the records in key
order from a VSAM file is just a trivial sequential read.  A hash map
makes it trivial to find a record with a given key, but if you also need
to access the records in key order, a sort of the keys is still
required.  I have applications that have used hash tables in exactly
that way, doing a tag-sort of the keys after the fact to allow ordered
access, but that is not a feature inherent in hash mapped records like
it is with a VSAM data set.

While as you point out it is possible to process a transaction file
against a database file without either being sorted by reading records
from one file (presumably the smaller one) into a hash map memory table
and then processing the other file and searching the hash table for
records with matching keys.  This  in general could require reading all
records in both files.  While this is an interesting approach and could
even be a reasonable approach in some cases, it doesn't scale well.  It
would be very wasteful for very large databases/files when the
transactions only affect a small percentage of the database records --
and again z/OS is an environment where very large databases and files
are common.

--
Joel C. Ewing,    Bentonville, AR       [email protected]

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

Reply via email to