11/12/2012 9:44 PM, H. S. Teoh пишет:
On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:
On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
The current "workaround" is to just dummy it.
alias bool Dummy;
Dummy[string] names;
names["Paul"] = true; //Ew.

Make a wrapper?

The problem with wrapped versions of associative arrays is that they
just don't scale with number of keys.  If I want to store (say) a
set of 3 million size_t's, then it costs a lot more than 3_000_000 *
size_t.sizeof() to do so this way.

Do dedicated set implementations do better than this?

Probably.

For storing integer types, you could optimize for space by dividing your
key space into blocks of, say, 32 entries each. Then your hash function
will hash only the upper (n-5) bits of the key, and each hash slot will
be a bit array of 32 bits, which are indexed by the lower 5 bits of the
key. Assuming your keys are relatively densely populated, this will save
quite a significant amount of space. In the sparse case, you probably
don't waste that much space either, since each slot is only one 32-bit
int wide.

Problem is that keys will collide so each slot will have to have upper bits of key *and* the small set. The bucket itself is still inside intrusive linked list.

Otherwise the idea is great. About 3 words of overhead per word-sized set.


You can probably also increase the size of the slots to, say, 64 bits or
more, if you want to maximize the usage to GC allocation block overhead
ratio. Yes, each slot is only 1 int wide, but the GC does have to
maintain bookkeeping information that's probably a lot more than that.

16 bytes. Though you'd have to know how bucket entry looks like which is impossible with built-in hash map sadly.

Allocating a large number of very small blocks is generally worse than a
smaller number of medium-sized blocks.

[snip other good points]


--
Dmitry Olshansky

Reply via email to