From all I remember about hash tables, you are supposed to minimize the number of collisions when adding/looking up data. This helps keep the O(1) lookup property intact.

Reading wikipedia (http://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing):

"To keep the load factor under a certain limit, e.g. under 3/4, many table implementations expand the table when items are inserted. For example, in Java's HashMap class the default load factor threshold for table expansion is 0.75 and in Python's dict, table size is resized when load factor is greater than 2/3."

Well, I just realized that D AA's "load factor" is 4 (see below). That means, for a hash table with 4 buckets, you need to add 17 items to generate a rehash. It also means that you are guaranteed to have one bucket with at least 4 elements on it before a rehash. And that's on an evenly spread hash table. In most cases, we would see buckets with around 8-10 elements.

I don't think this is optimal, but I'm nervous about correcting this -- going from a load factor of 4 to a load factor of 0.75 would be a tremendous jump. I'm not certain how the interaction with the GC and the list of primes in the AA runtime will fare with this new load.

Any thoughts from the experts out there?

-Steve

Code that uses factor of 4: https://github.com/D-Programming-Language/druntime/blob/9e410b0bd91ead4439bf304c28c2fce511df436a/src/rt/aaA.d#L221

Reply via email to