Re: Ideas for hash of a sequential data set

Paul Gilmartin Thu, 20 Aug 2015 13:51:19 -0700

On Thu, 20 Aug 2015 11:55:22 -0500, Kirk Wolf wrote:
>
>The problem, of course, is that DSCBs don't have "last update timestamps".
>
And in systems that have them, timestamps often can be forged by the user.
I once recovered a z/OS HMS backed up data set and was dismayed to see
that the timestamp was set to the time of recovery rather than the time of
last access.


>My initial whack at this would be to use a two-part hash:
>
>part 1: a shortened SHA1-hash of the format-1/8 DSCB
>part 2: a full SHA-1 hash of all of the data
>
Your hashing should be sensitive to record boundaries, else an operation
as simple as splitting a record in two will not be detected as a change.
Perhaps hash the RDWs also.  (I argued this on CMS-PIPELINES a while
ago.  The Bad Guys won.)

Would performance be better by replacing hash with diff(1) or cmp(1)?
Same amount of I/O; less computation.  And cmp could exit early on
detecting the first difference; no need for a preliminary short hash.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Ideas for hash of a sequential data set

Reply via email to