Paul Gilmartin wrote:
On Tue, 27 Jul 2010 22:06:18 +0000, john gilmore wrote:
Yes, 61, which is prime, is better than 64 = 2^6, which is composite.

...


If division-method hashing is used a prime divisor/modulus is highly desirable. 
 Clustering at the prime divisors of a composite modulus does occur.  I dislike 
arguments from authority, but 1) this is not the place for a bibliography and 
2) RKFATWTF.

For pure division-method, surely.  Modulo 64 merely extracts the
rightmost 6 bits of the original string.  But you had suggested
CKSUM, then modulo.  And if CKSUM is of high quality, I'd expect
any modulus to give good results.


Unfortunately CKSUM is not high quality. It's pathologically pathetic. It's nothing more than an additive hash. You will be forced to use a hash table sized by a prime number. The best hash table sizes are powers of two so you can use bitwise ANDs instead of mod division, much
faster for dynamically resizing the table.

A good hash function must achieve avalanche which requires a mixing step using magic ratios (usually a special prime number). I've timed just about every known hashing algorithm for strings and the best one by some distance on z is the murmur hash.

http://sites.google.com/site/murmurhash/

Here's the murmur 2 hash assembler produced by the Metal/C compiler.

HASHMUR#C CSECT
HASHMUR#C AMODE 31
HASHMUR#C RMODE ANY
        SYSSTATE ARCHLVL=2
* unsigned int MurmurHash2(
*    const void * key,
*    int len,
*    unsigned int seed
*    )
.* The HLASM GOFF option is needed to assemble this code
@@c...@1  ALIAS C'MurmurHash2'
        ENTRY @@c...@1
@@c...@1  AMODE 31
@@c...@1  RMODE ANY
@@c...@1  DS    0F
        STM   14,6,12(13)
        LR    15,13
        L     13,8(,13)
        ST    15,4(,13)
@@b...@1  DS    0H
        USING @@a...@1,13
        LARL  3,@@l...@1
        USING @@l...@1,3
        STMH  14,6,80(13)
* {
*     // 'm' and 'r' are mixing constants generated offline.
*     // They're not really 'magic', they just happen to work well.
*     const unsigned int m = 0x5BD1E995;
*     const int r = 24;
*
*     // Initialize the hash to a 'random' value
*     unsigned int h = seed ^ len;
        USING @@pa...@1,1
        L     15,@4seed
        L     0,@3len
        X     15,@3len
        CHI   0,4
*
*     // Mix 4 bytes at a time into the hash
*     const unsigned char * data = (const unsigned char *)key;
        L     14,@2key
*
*     while(len >= 4)
        BRL   @1L4
        LR    2,0
        AHI   2,-4
        SRL   2,2
        AHI   2,1
        SLLG  4,2,2
        LARL  1,@@co...@area@@
        LR    6,2
        SLR   0,4
@1L3     DS    0H
*     {
*        unsigned int k = *(unsigned int *)data;
        L     4,0(,14)                (*)uint
*
*        k *= m;
        MS    4,0(,1)
*        k ^= k >> r;
        LR    5,4
        SRL   5,24
        XR    4,5
        MS    15,=F'1540483477'
*        k *= m;
        MS    4,=F'1540483477'
*
*        h *= m;
*        h ^= k;
*
*        data += 4;
        LA    14,4(,14)               #AddressShadow
        XR    15,4
        BRCT  6,@1L3
@1L4     DS    0H
*        len -= 4;
*     }
*
*     // Handle the last few bytes of the input array
*     switch(len)
        CHI   0,1
        BRE   @1L7
        CHI   0,2
        BRE   @1L8
        CHI   0,3
        BRNE  @1L5
*     {
*     case 3: h ^= data[2] << 16;
        LLGC  0,2(0,14)               (*)Cuchar
        SLL   0,16
        XR    15,0
@1L8     DS    0H
*     case 2: h ^= data[1] << 8;
        LLGC  0,1(0,14)               (*)Cuchar
        SLL   0,8
        XR    15,0
@1L7     DS    0H
*     case 1: h ^= data[0];
        LLGC  14,0(0,14)              (*)Cuchar
        XR    15,14
*             h *= m;
        MS    15,=F'1540483477'
@1L5     DS    0H
*     };
*
*     // Do a few final mixes of the hash to ensure the last few
*     // bytes are well-incorporated.
*     h ^= h >> 13;
        LR    14,15
        SRL   14,13
        XR    15,14
*     h *= m;
        MS    15,=F'1540483477'
*     h ^= h >> 15;
        LR    14,15
        SRL   14,15
        XR    15,14
*
*     return h;
* }
@1L6     DS    0H
        LMH   14,6,80(13)
        DROP
        L     13,4(,13)
        L     14,12(,13)
        LM    1,6,24(13)
        BR    14
        DS    0F
@@l...@1  LTORG
        EJECT
@@a...@1 DSECT
        DS    XL144
        ORG   @@a...@1
#GPR_SA_1 DS   18F
        DS    F
@@pa...@1 DSECT
        DS    XL12
        ORG   @@pa...@1+0
@2key    DS    F
        ORG   @@pa...@1+4
@3len    DS    F
        ORG   @@pa...@1+8
@4seed   DS    F
        EJECT
HASHMUR#C CSECT ,
@@co...@area@@ DS 0D
        DC    XL4'5BD1E995'
        END   ,(5694A01   ,1B00,10209)


RKFATWTF!?  Not in my lexicon.  Nor in Google's, apparently.  But
the last three characters are familiar and hauntingly apt.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to