On Tue, Dec 17, 2013 at 08:54:17PM +0100, Craig Dillabaugh wrote: > On Tuesday, 17 December 2013 at 19:24:30 UTC, tn wrote: > >On Tuesday, 17 December 2013 at 19:09:49 UTC, H. S. Teoh wrote: > >>However, they do exhibit good spatial locality in higher-dimensional > >>space (i.e., if entry X is accessed first, then the next entry Y is > >>quite likely to be close to X in that space). Does anybody know of > >>a good data structure that can take advantage of this fact to > >>minimize disk accesses? > > > >Perhaps some kind of k-d tree? Maybe even some sort of cache > >oblivious k-d tree to make it more suitable for storage on the > >disk? > > Its not cache-oblivious buy he could try the KDB tree: > > http://dl.acm.org/citation.cfm?id=582321 > > or, again for batched queries the is the > > Bkd-tree (search on Bkd-tree and Lars Arge) - this paper is a type > of buffer tree so the operations need to be able to be run as a > batch.
I skimmed over the KDB tree paper, and am reading the Bkd-tree paper now; it appears that the latter is an improvement over the former, and has better update performance characteristics. However, it also appears much more complicated to implement. Another point I forgot to mention, is that currently I only ever perform insertion and queries, and the two can actually be combined (basically the table is kept for uniqueness testing). I don't need to delete entries from the table. Eventually I might look into that to reduce disk space requirements, but as far as the actual algorithm is concerned, it's not necessary. So there's potentially a good amount of room here for optimization by foregoing the ability to delete entries. I'm currently considering using simpler, "dumber" file structures for storing nodes, with an in-memory Bloom filter to eliminate a (hopefully) large number of I/O roundtrips, since I'm expecting that a good fraction of queries will return 'not found'. T -- "A one-question geek test. If you get the joke, you're a geek: Seen on a California license plate on a VW Beetle: 'FEATURE'..." -- Joshua D. Wachs - Natural Intelligence, Inc.
