On 17.12.2013. 20:08, H. S. Teoh wrote: > Another OT thread to pick your brains. :) > > What's a good, efficient file structure for storing extremely large > lookup tables? (Extremely large as in > 10 million entries, with keys > and values roughly about 100 bytes each.) The structure must support > efficient adding and lookup of entries, as these two operations will be > very frequent. > > I did some online research, and it seems that hashtables perform poorly > on disk, because the usual hash functions cause random scattering of > related data (which are likely to be access with higher temporal > locality), which incurs lots of disk seeks. > > I thought about B-trees, but they have high overhead (and are a pain to > implement), and also only exhibit good locality if table entries are > accessed sequentially; the problem is I'm working with high-dimensional > data and the order of accesses is unlikely to be sequential. However, > they do exhibit good spatial locality in higher-dimensional space (i.e., > if entry X is accessed first, then the next entry Y is quite likely to > be close to X in that space). Does anybody know of a good data > structure that can take advantage of this fact to minimize disk > accesses? > > > T >
sqlite file format seems to be fairly documented: http://www.sqlite.org/fileformat.html
