Hello, I was wondering if there were any docs or explanations available on how SQLite decides to lay out data on disk.
The context is the monotone project -- http://venge.net/monotone -- a VCS where we store project history in a sqlite database. Sqlite has been _very_ good to us, and we'd rather not get into the game of writing our own transactional storage system, for really obvious reasons. However, it's starting to look like we have some performance bottlenecks here, that might push us that way. In particular, consider the problem of reconstructing bitstrings given a bunch of xdelta's (basically one-way patches) and full texts; a typical problem would be to calculate a chain of deltas that applied one-by-one to a full text will give the desired text, then pull those deltas and the full text out of the db and apply them. We just store these text objects as columns in two tables. In its current implementation in monotone, this algorithm seems to be seek-bound. Some profiling shows that there are cases where we're spending more time blocked waiting for read()s for the db, than it takes to read the entire db 5 times over. This makes sense. We're basically doing random reads scattered all over the file, so the OS has no way to do any sort of readahead, which means we're probably hitting disk seek latency on every read, and like usual these days, latency trumps bandwidth. So, the question is how to fix this. Some (well, one) of our competitors use a custom file format tweaked to minimize this disk latency. Like I said, we'd rather not go this way if we can avoid it; single-file storage is _so_ nice to have. But, while obviously our current mechanism is sub-optimal -- we neither write out the deltas in a helpful order, or index them in any useful way, so there's no way SQLite _could_ optimize for our access pattern even if it wanted to -- I have no idea what sort of tricks I could use to help SQLite help me. How does SQLite choose where to put new records on disk, and is there any way I can give it useful hints? -- Nathaniel P.S.: Sorry for the over-long message :-). -- "Of course, the entire effort is to put oneself Outside the ordinary range Of what are called statistics." -- Stephan Spender