On Tue, Dec 17, 2013 at 09:43:37PM +0100, Dylan Knutson wrote: > It's not a datastructure persay, but Google's LevelDB is very good, > quite fast arbitrary key-value storage system (I use it in my > project relating a 1 word key to a 200 word value) supporting batch > transactional operations. [...]
I'm not looking for a transactional DB, though; I'm looking for a fast key-value storage system with very high insert/query throughput, and hopefully good space requirements. Deletion is not needed, though for very large problems it might be useful to reduce disk space usage. It must be able to handle huge numbers of insertions (say 10 million entries for small/moderate problem instances, perhaps in the billions or trillions, if I use it on a large problem instance), and must be able to do so very fast -- the faster the better. Preferably, it would take advantage of the n-dimensional locality that I described in my other post. Basically, the idea is to reduce I/O roundtrips by caching disk pages in memory, because there are far more entries than will fit in memory all at once. But since the number of entries is very large, to prevent thrashing I have to maximize the number of cache hits. It does no good if I have to load 100 disk pages to answer 100 queries; if 100 pages fit in RAM, I'd like most of the entries in those 100 pages to be used in answering queries before they get swapped out for new pages to be loaded in. In the ideal case (probably not attainable), those 100 pages will contain exactly the next, say, million entries I'm about to query, so that once I'm done with them, I can swap those pages out and never have to load them back in again, giving the space for the next 100 pages to be loaded and queried. So the layout of the disk pages should be as closely corresponding to the order the algorithm will be looking up entries as possible. (Unfortunately it's not 100% predictable, as it depends on the problem being solved, but at least there are general trends we can take advantage of.) A generic storage scheme like SQLite will probably not perform as fast as I'd like. T -- It is impossible to make anything foolproof because fools are so ingenious. -- Sammy
