If you want to see a good example of a well used assembler hashing algorithm you need to be an IMS customer. IMS has the, curiously misnamed, HDAM, PHDAM and DEDB "randomiser". It should be more correctly called a hashing algorithm. The purpose of DFSHDC40, DBFHDC40 and DBFHDC44 is to take a database key and hash it to a root anchor point within the database. Those routines have been around since IMS first got the hierarchical direct access method. We pass four parameters: key, # of RAPS, # of blocks in root addressable area and max number of bytes in a root segment to go in the RAA.
The output is a block & rap number (or area, block & RAP for DEDB). The code used to be published in the IMS docs, it's now only in IMS.SDFSSRC The comments say is uses a "RANGE RATIO METHOD" to hash the key to the RAP. If anyone would like a copy of the source code send my an email off the list. On 2 November 2012 21:43, Paul Gilmartin <[email protected]> wrote: > On 2012-11-02 15:01, Martin Truebner wrote: > > I still do not see how changing a numbering scheme from based > > on ten to a system based on thirty-six does create any clusters. > > > It doesn't. Robin has pretty much acknowledged that if the > input data are uniform, a modulus hash will likewise be uniform. > Others in this thread are more fixated on defending prime > moduli. > > But consider: if the input data are binary strings and you > choose 2^n as a modulus, the hash will merely be the last > n bits of the input datum. Powers of two are a bad choice > for modulus-hashing EBCDIC text where that last character > is likely to be clustered around displayable code points. > > OTOH if the input data are base-37 numbers a modulus 37 > hash merely returns the last digit; again a bad choice. > > But computer scientists have a cultural bias toward base > 2, not any other prime such as 37, even as number theorists > have some cultural bias toward base 10 (evident, at least, > in recreational essays). > > -- gil > -- http://twitter.com/DougieLawson
