Re: Why sort (was Microprocessor Optimization Primer)

Jesse 1 Robinson Thu, 07 Apr 2016 10:53:16 -0700

An excellent synopsis of mainframe history. It follows that most mature shops 
use SORT extensively because until recently, the platform pretty much required 
it for reasonable performance as measured by wall clock. One could argue--maybe 
even prove--that today's DASD allows more random updating than in the days of 
yore, but a mature shop that has orchestrated batch around sorting would find 
it a hard sell to convince business units (i.e. paying customers) to reengineer 
massive production processes just it's possible.

We explored TVS (Transactional VSAM) in ESP some years ago. As wonderful as it 
sounded--and probably was--the target applications folks balked at having to 
redesign their update programs because the processing logic is totally 
different. Unfortunately, I think they moved off of mainframe instead. ;-(( 

.
.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager
323-715-0595 Mobile
626-302-7535 Office
robin...@sce.com

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Joel C. Ewing
Sent: Wednesday, April 06, 2016 9:59 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: (External):Re: Why sort (was Microprocessor Optimization Primer)

On 04/06/2016 07:01 AM, Andrew Rowley wrote:
> On 05/04/2016 01:20 AM, Tom Marchant wrote:
>> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
>>
>>> A Hashmap potentially allows you to read sequentially and match 
>>> records between files, without caring about the order.
>> Can you please explain what you mean by this? Are you talking about 
>> using the hashmap to determine which record to read next, and so to 
>> read the records in an order that is logically sequential, but 
>> physically random? If so, that is not at all like reading the records 
>> sequentially.
>>
>
> If one file fits in memory, you can read it sequentially into a 
> Hashmap with the using the data you want to match as the key.
> Then read the second one, also sequentially, retrieving matching 
> records from the Hashmap by key. You can also remove them from the 
> Hashmap as they are found if you need to know if any are unmatched.
>
> But this is a solution for a made up case - I don't know whether it is 
> a common situation. I was interested in hearing real reasons why sort 
> is so common on z/OS i.e. Why sort?
>
> On Hashmaps etc. in general - they are the memory equivalent to 
> indexed datasets (VSAM etc) versus sequential datasets. Their 
> availability opens up many new ways to process data - and algorithm 
> changes are often where the big savings can be made.
>
I believe others have already alluded to the potential time advantage of 
processing a large number of updates in key order rather than randomly when 
external data is indexed but actually physically ordered by some key.  The 
reason why this has historically been the case is that external disk storage 
devices which allow random access have rotational-latency delay and 
access-head-positioning delay which is minimized when doing full-track or even 
multi-track I/O and when accessing adjacent cylinders.  The way to update the 
data in minimal real time is to do the I/O in minimal disk rotations, accessing 
all data needed on the same track in one rotation and all data in one cylinder 
before moving to an adjacent cylinder. Crucial to this concept is understanding 
that z/OS includes support within I/O access methods which allows applications 
to successfully exploit the ability of DASD hardware to transfer one, several, 
or all data blocks on a track as a single operation within a single disk 
revolution. 

With emulated DASD and hardware DASD caching, the effects of physical track and 
cylinder boundaries may be unknownl, but it is still likely that minimizing 
repeated visitations to an emulated track  or an emulated cylinder will achieve 
similar locality of reference on physical DASD, reduce latency delays and 
improve the effectiveness of hardware caching.  Processing transaction records 
in the same order as the database records are physically stored on an external 
file gives the best odds of grouping transactions needing the same track and 
cylinder together and for minimizing I/O delays and minimizing demands on DASD 
cache storage and processor storage for file buffers.  Processing transactions 
in a different order increases the likelihood that the needed file data to 
process the transaction is no longer in processor memory or disk cache and that 
at a minimum the time equivalent of another disk revolution  will be required 
to obtain it.

It was not uncommon with VSAM files for transaction sorting to improve 
real-time processing speed sufficiently that the break-even point even with 
sorting overhead could be as low as updating only 5% of the database.  These 
techniques were common in MVS and its z/OS successor applications because it 
was common for those systems to deal with very large files and databases where 
tricks like this were necessary in order to meet constrained nightly batch 
processing windows..  Since it is common in z/OS to be dealing with very large 
files and databases, there are always files in those environments that are too 
large to consider placing the entire file in memory, no matter how large 
processor memory becomes.

Hash maps are not really equivalent to VSAM data sets because a VSAM file is 
not just indexed, but indexed-sequential, which means once you have 
successfully stored records in the file, reading the records in key order from 
a VSAM file is just a trivial sequential read.  A hash map makes it trivial to 
find a record with a given key, but if you also need to access the records in 
key order, a sort of the keys is still required.  I have applications that have 
used hash tables in exactly that way, doing a tag-sort of the keys after the 
fact to allow ordered access, but that is not a feature inherent in hash mapped 
records like it is with a VSAM data set.

While as you point out it is possible to process a transaction file against a 
database file without either being sorted by reading records from one file 
(presumably the smaller one) into a hash map memory table and then processing 
the other file and searching the hash table for records with matching keys.  
This  in general could require reading all records in both files.  While this 
is an interesting approach and could even be a reasonable approach in some 
cases, it doesn't scale well.  It would be very wasteful for very large 
databases/files when the transactions only affect a small percentage of the 
database records -- and again z/OS is an environment where very large databases 
and files are common.

-- 
Joel C. Ewing,    Bentonville, AR       jcew...@acm.org

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

Reply via email to