03-Nov-2013 02:37, Charles Hixson пишет:
I'm contemplating an associative array that will eventually grow to be an estimated 64KB in size, assuming it's about half full. It would then be holding around 90,000 entries. Is this reasonable, or should I go with a sorted array, and do binary searches? I estimate that binary searches would average around 15 accesses/hit, but that's still a lot faster than disk access. The array would be nearly full, so there would be less wasted space, but, of course, insertions would become quite expensive. (Deletions aren't expected.) In either case the keys would be fixed length character arrays, and the values would also be simple structs without reference types (so the garbage collector could pretty much ignore things).
90k elements is a small AA. It would work just fine with any sane hash function (e.g. the built-in one).
At around 10M+ elements it may be worth to consider space saving techniques. That said on x64 100M+ is perfectly OK, on 32bit I'd advise against it but only because of GC.
FWIW I'm expecting this array to be around the entire time the program is running, but it would probably be rolled out most of the time, as frequently accessed items would be pulled into a much smaller structure, and if I go with the array rather than the hash table I may only do updates as a batch. (Since it's linear with the size of the array, but essentially ignores the number of items being added.) My temptation is to go with the hash table...but I'm not sure of the underlying mechanics.
Think of it as a sparse array that has "holes" and takes around ~2-4 times the size of normal array. It's not only because of holes but because a slot is a bit bigger.
In return lookups/inserts/deletions are cheap O(1) with high probability. -- Dmitry Olshansky
