writes:
<begin snippet>
The entries are short (4 bytes) and the key is the entire entry. My
understanding is that in this situation a hash table has no benefit. Is this
correct?
</end snippet>
No. Having four-byte does, however, simplify hashing.
Pick a small prime modulus p. Treat your four bytes as a fullword signed
integer. Divide this integer by p. The POSITIVE remainder r will be in the
interval
0 <= r <= p - 1.
Use an array of pointers to p sublists, slp(0:p-1) chaining your entries
together, i.e., inserting them
im ascending algebraic sequence.
Let N be the sum of the numbers of elements in all of these sublists. When you
get a new value, you will then need to search only a sublist of average length
N/p instead of a single list of length N.
For, say, N=0(10)100 and p=17, worst-case comparison-count behavior is then
0, 1, 1.18, 1.76, 2.35, 2.94, 3.53, 4.12, 4.71, 5.29, 5.88
versus
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
The performance ratios here, u, 1/10, 1.18/20, . . . , are
0, 0.588, 0.059, 0.059, 0.059, 0.0598, 0.059, . . .
For N = 1000, 1000/17 = 58.82, and 58.82/1000 = 0.05882.
Your expert probably misunderstood the question you asked him.
Take care to use a prime hashing modulus. For, say, p= 24 = 2 x 2 x 2 x 3, you
woulds get clustering, longer sublists, at its prime factors, the hash values 2
and 3.
John Gilmore Ashland, MA 01721-1817 USA