On Feb 17, 2011, at 07:56, John P. Hartmann wrote:

> Below is an EXEC I wrote to check whether the 32-bit CRC is really
> unique.  As it does a SORT UNIQUE, there are limits to the size of
> files that can be checked.
>
This empirical approach is much supplemented by the theory:

    http://en.wikipedia.org/wiki/Birthday_attack

In the simplest computation, at about 2^16 samples the probability of
a collision rises to about 50%.  This ought to be well within reach
of your test.

I'm confident that MD5 is better than CRC-32, and MD5 is now deprecated.
SHA-1, etc. are better.  Some Chinese mathematicians have a proof that
with about 2^69 carefully chosen samples a collision becomes likely in
SHA-1.  I don't know that an actual collision has been exhibited.

Somewhere I saw IBM's description of its SuperC.  IIRC, it begins
by computing a hash of each record.

Are you considering writing a compare stage?  It would be difficult
to do better than, or even match SuperC.  But a few years ago I asked
about a compare stage.  I need only an identical/different indication.
I'm using COMPARE MODULE (I think it's called).

-- gil

Reply via email to