Re: Why sort (was Microprocessor Optimization Primer)

John McKown Wed, 06 Apr 2016 05:57:46 -0700

On Wed, Apr 6, 2016 at 7:01 AM, Andrew Rowley <[email protected]>
wrote:

> On 05/04/2016 01:20 AM, Tom Marchant wrote:
>
>> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
>>
>> A Hashmap potentially allows you to read sequentially and match records
>>> between files, without caring about the order.
>>>
>> Can you please explain what you mean by this? Are you talking about using
>> the hashmap to determine which record to read next, and so to read the
>> records in an order that is logically sequential, but physically random?
>> If so,
>> that is not at all like reading the records sequentially.
>>
>>
> If one file fits in memory, you can read it sequentially into a Hashmap
> with the using the data you want to match as the key.
> Then read the second one, also sequentially, retrieving matching records
> from the Hashmap by key. You can also remove them from the Hashmap as they
> are found if you need to know if any are unmatched.
>
> But this is a solution for a made up case - I don't know whether it is a
> common situation. I was interested in hearing real reasons why sort is so
> common on z/OS i.e. Why sort?
>

Not meaning to sound silly, but I fear the main reason may be the good
old: "We've always done it that way". 

And, since most of the in-house software written in z/OS is in some version
of COBOL, there is no other real choice because COBOL does not have
anything like a content addressable "array" built into the language. IMO, a
major deficiency in IBM's COBOL, and maybe other vendors' COBOLs, is that
it does not come with a great library of functionality. It is simple to do
things in Java, Perl, PHP, python, and Go because of the huge amount of
support in the libraries. COBOL basically has the barest of native data
types. And basically only has integer indexed arrays and structures as ways
to "group" things together. Also, COBOL has pretty much the barest of run
time routines. And the only invocation of anything in a library is via the
CALL verb. I guess that it's sad that the object oriented portion of the
latest COBOL compilers seem to be ignored.

So, why not migrate away from COBOL to a more advanced language? Many
places are doing so for new work or development (or going to a non-z
platform). Also, do you really need to buffer up everything in a Hashmap if
your data resides in a relational database? It is generally much better to
let the RDBMS do most of the work. And it will buffer up the active data,
not only from your program but every program which is accessing the data.
In this case, do a SORT could possibly be unnecessary. Or you may need to
do a SORT if you are writing a report sorted by a value created in the
program itself. Do you really want to use a Hashmap to store the unsorted
electricity bills for Los Angeles, and then, at the end, read & write said
bills by reading the Hashmap by key? This sort of thing goes on a _lot_ on
z/OS. Just my take on it.

I'm not against using something other than SORT if I think it will work
well. But SORT (DFSORT & Syncsort) are extremely fast and efficient. So if
I need something done which they can do, then I think it is best to use
them rather than code something up myself, in any language.

>
> On Hashmaps etc. in general - they are the memory equivalent to indexed
> datasets (VSAM etc) versus sequential datasets. Their availability opens up
> many new ways to process data - and algorithm changes are often where the
> big savings can be made.
>
>
-- 
How many surrealists does it take to screw in a lightbulb? One to hold the
giraffe and one to fill the bathtub with brightly colored power tools.

Maranatha! <><
John McKown

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

Reply via email to