From: "Bill Fairchild" <[email protected]> Sent: Tuesday, 24 August 2010 4:04 AM
Your random input entry is a 32-bit number. Two to the 32nd power is 4GB. Let the 32-bit number represent the number of a bit in a contiguous string of bits that is 4 billion bits long. The number of bytes to contain this array is 537 MB, which can easily be acquired above the bar or in a data space. Find the byte address and bit offset within that byte that corresponds to the 32-bit input number. Test the bit. If off, you have a new entry, so turn the bit on and process the new entry. If on, you have a duplicate entry, so ignore the entry.
That is unworkable. It's also the reason that I asked how large the list is. Until that is known, it's pointless suggesting strategies for solving this kind of problem. Now that we know that the list is "less than a few hundred entries", it would seem that hashing would speed this up considerably (compared with binary search with insertion). Insertion takes n/2 moves on average. Searching takes log n (base 2) steps. Hashing takes one (or a few steps) to look up, and insertion takes one (or a few) steps. Make the table large enough (for, say, 10 x 200 entries) with provision for expanding the table.
