writes:
 
<begin snippet>
The entries are short (4 bytes) and the key is the entire entry. My 
understanding is that in this situation a hash table has no benefit. Is this 
correct?
</end snippet>
 
No.  Having four-byte does, however, simplify hashing.
 
Pick a small prime modulus p.  Treat your four bytes as a fullword signed 
integer.  Divide this integer by p.  The POSITIVE remainder r will be in the 
interval
 
0 <= r <=  p - 1.
 
Use an array of pointers to p sublists, slp(0:p-1) chaining your entries 
together, i.e., inserting them
im ascending algebraic sequence.
 
Let N be the sum of the numbers of elements in all of these sublists.  When you 
get a new value, you will then need to search only a sublist of average length 
N/p instead of a single list of length N.
 
For, say, N=0(10)100 and p=17, worst-case comparison-count behavior is then
 
0, 1, 1.18, 1.76, 2.35, 2.94, 3.53, 4.12, 4.71, 5.29, 5.88
 
versus
 
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
 
The performance ratios here,  u, 1/10, 1.18/20, . . . , are
 
0, 0.588, 0.059, 0.059, 0.059, 0.0598, 0.059, . . .
 
For N = 1000, 1000/17 = 58.82, and 58.82/1000 = 0.05882.
 
Your expert probably misunderstood the question you asked him.
 
Take care to use a prime hashing modulus.  For, say, p= 24 = 2 x 2 x 2 x 3, you 
woulds get clustering, longer sublists, at its prime factors, the hash values 2 
and 3.  

John Gilmore Ashland, MA 01721-1817 USA

                                          

Reply via email to