Re: Why sort (was Microprocessor Optimization Primer)

Mitch Mccluhan Thu, 07 Apr 2016 05:22:05 -0700

...hey Wayne.

Mitch Mccluhan
mitc...@aol.com


On Thursday, April 7, 2016 Wayne Bickerdike <wayn...@gmail.com> wrote:
I'm slightly gobsmacked that this discussion is needed. I guess the forest
is lost in the trees.

I can recommend "Principles of Program Design" by Michael Jackson c. 1975.

Of greater concern is the implication that Oracle on AIX outperforms DB2 on
z/OS at our shop. Surely not :(

On Thu, Apr 7, 2016 at 2:59 PM, Joel C. Ewing <jcew...@acm.org> wrote:

> On 04/06/2016 07:01 AM, Andrew Rowley wrote:
> > On 05/04/2016 01:20 AM, Tom Marchant wrote:
> >> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
> >>
> >>> A Hashmap potentially allows you to read sequentially and match records
> >>> between files, without caring about the order.
> >> Can you please explain what you mean by this? Are you talking about
> >> using
> >> the hashmap to determine which record to read next, and so to read the
> >> records in an order that is logically sequential, but physically
> >> random? If so,
> >> that is not at all like reading the records sequentially.
> >>
> >
> > If one file fits in memory, you can read it sequentially into a
> > Hashmap with the using the data you want to match as the key.
> > Then read the second one, also sequentially, retrieving matching
> > records from the Hashmap by key. You can also remove them from the
> > Hashmap as they are found if you need to know if any are unmatched.
> >
> > But this is a solution for a made up case - I don't know whether it is
> > a common situation. I was interested in hearing real reasons why sort
> > is so common on z/OS i.e. Why sort?
> >
> > On Hashmaps etc. in general - they are the memory equivalent to
> > indexed datasets (VSAM etc) versus sequential datasets. Their
> > availability opens up many new ways to process data - and algorithm
> > changes are often where the big savings can be made.
> >
> I believe others have already alluded to the potential time advantage of
> processing a large number of updates in key order rather than randomly
> when external data is indexed but actually physically ordered by some
> key. The reason why this has historically been the case is that
> external disk storage devices which allow random access have
> rotational-latency delay and access-head-positioning delay which is
> minimized when doing full-track or even multi-track I/O and when
> accessing adjacent cylinders. The way to update the data in minimal
> real time is to do the I/O in minimal disk rotations, accessing all data
> needed on the same track in one rotation and all data in one cylinder
> before moving to an adjacent cylinder. Crucial to this concept is
> understanding that z/OS includes support within I/O access methods which
> allows applications to successfully exploit the ability of DASD hardware
> to transfer one, several, or all data blocks on a track as a single
> operation within a single disk revolution.
>
> With emulated DASD and hardware DASD caching, the effects of physical
> track and cylinder boundaries may be unknownl, but it is still likely
> that minimizing repeated visitations to an emulated track or an
> emulated cylinder will achieve similar locality of reference on physical
> DASD, reduce latency delays and improve the effectiveness of hardware
> caching. Processing transaction records in the same order as the
> database records are physically stored on an external file gives the
> best odds of grouping transactions needing the same track and cylinder
> together and for minimizing I/O delays and minimizing demands on DASD
> cache storage and processor storage for file buffers. Processing
> transactions in a different order increases the likelihood that the
> needed file data to process the transaction is no longer in processor
> memory or disk cache and that at a minimum the time equivalent of
> another disk revolution will be required to obtain it.
>
> It was not uncommon with VSAM files for transaction sorting to improve
> real-time processing speed sufficiently that the break-even point even
> with sorting overhead could be as low as updating only 5% of the
> database. These techniques were common in MVS and its z/OS successor
> applications because it was common for those systems to deal with very
> large files and databases where tricks like this were necessary in order
> to meet constrained nightly batch processing windows.. Since it is
> common in z/OS to be dealing with very large files and databases, there
> are always files in those environments that are too large to consider
> placing the entire file in memory, no matter how large processor memory
> becomes.
>
> Hash maps are not really equivalent to VSAM data sets because a VSAM
> file is not just indexed, but indexed-sequential, which means once you
> have successfully stored records in the file, reading the records in key
> order from a VSAM file is just a trivial sequential read. A hash map
> makes it trivial to find a record with a given key, but if you also need
> to access the records in key order, a sort of the keys is still
> required. I have applications that have used hash tables in exactly
> that way, doing a tag-sort of the keys after the fact to allow ordered
> access, but that is not a feature inherent in hash mapped records like
> it is with a VSAM data set.
>
> While as you point out it is possible to process a transaction file
> against a database file without either being sorted by reading records
> from one file (presumably the smaller one) into a hash map memory table
> and then processing the other file and searching the hash table for
> records with matching keys. This in general could require reading all
> records in both files. While this is an interesting approach and could
> even be a reasonable approach in some cases, it doesn't scale well. It
> would be very wasteful for very large databases/files when the
> transactions only affect a small percentage of the database records --
> and again z/OS is an environment where very large databases and files
> are common.
>
> --
> Joel C. Ewing, Bentonville, AR jcew...@acm.org
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>



-- 
Wayne V. Bickerdike

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

Reply via email to