On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote: > On 11/12/2012 05:27 PM, Dmitry Olshansky wrote: > >>The current "workaround" is to just dummy it. > >>alias bool Dummy; > >>Dummy[string] names; > >>names["Paul"] = true; //Ew. > > > >Make a wrapper? > > The problem with wrapped versions of associative arrays is that they > just don't scale with number of keys. If I want to store (say) a > set of 3 million size_t's, then it costs a lot more than 3_000_000 * > size_t.sizeof() to do so this way. > > Do dedicated set implementations do better than this?
Probably. For storing integer types, you could optimize for space by dividing your key space into blocks of, say, 32 entries each. Then your hash function will hash only the upper (n-5) bits of the key, and each hash slot will be a bit array of 32 bits, which are indexed by the lower 5 bits of the key. Assuming your keys are relatively densely populated, this will save quite a significant amount of space. In the sparse case, you probably don't waste that much space either, since each slot is only one 32-bit int wide. You can probably also increase the size of the slots to, say, 64 bits or more, if you want to maximize the usage to GC allocation block overhead ratio. Yes, each slot is only 1 int wide, but the GC does have to maintain bookkeeping information that's probably a lot more than that. Allocating a large number of very small blocks is generally worse than a smaller number of medium-sized blocks. If your keys are *very* dense, then storing ranges instead of individual elements is the way to go (though the implementation will be significantly more complex). For storing sets of strings, though, you'd want something like a trie instead. It all depends on your specific application. So a per application implementation might not be out of place. > I should say that my own interests in an implementation of set are > twofold -- first, efficient storage; and second, being able to do an > efficient foreach across the stored values, without any concern for > order. With the hash + bit-array approach above, foreach would be very simple (just walk the hash table slots and enumerate the bits in each slot). > >Or rather call them sets and have them in the library. > > The whole builtin-vs.-library thing seems a big distraction -- if > something needs to be implemented, the library seems the natural > place unless there's a very good reason otherwise. Yeah, there's definitely an advantage to minimizing the built-in stuff and keeping them in the library. Keeps the language relatively clean and simple, but still provide convenient library types for common tasks. T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.
