cmp requires both files, right? I suspect the problem here is "is the file 
different today than what it was yesterday?"

Interesting point on record lengths. Even RDW hashes don't actually solve the 
entire (theoretical -- probably not real) problem. What about two FB files with 
the same bits but different record lengths (e.g., 500 80-byte records versus 
1000 40-byte records). They are unarguably different but a hash might yield the 
same sum for both.

Is the requirement exclusively QSAM files? What about VSAM? What about PDS? 
PDSE? PDS(E) as a whole or member by member? I strongly suspect that unlike for 
a QSAM file, that if I overwrote a PDSE member with identical data, the hash 
for the PDSE as a whole would change.

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Paul Gilmartin
Sent: Thursday, August 20, 2015 1:51 PM
To: [email protected]
Subject: Re: Ideas for hash of a sequential data set

On Thu, 20 Aug 2015 11:55:22 -0500, Kirk Wolf wrote:
>
>The problem, of course, is that DSCBs don't have "last update timestamps".
>
And in systems that have them, timestamps often can be forged by the user.
I once recovered a z/OS HMS backed up data set and was dismayed to see that the 
timestamp was set to the time of recovery rather than the time of last access.

>My initial whack at this would be to use a two-part hash:
>
>part 1: a shortened SHA1-hash of the format-1/8 DSCB part 2: a full 
>SHA-1 hash of all of the data
>
Your hashing should be sensitive to record boundaries, else an operation as 
simple as splitting a record in two will not be detected as a change.
Perhaps hash the RDWs also.  (I argued this on CMS-PIPELINES a while ago.  The 
Bad Guys won.)

Would performance be better by replacing hash with diff(1) or cmp(1)?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to