Hello,
I was wondering if there were any docs or explanations available on
how SQLite decides to lay out data on disk.

The context is the monotone project -- http://venge.net/monotone --
a VCS where we store project history in a sqlite database.  Sqlite has
been _very_ good to us, and we'd rather not get into the game of
writing our own transactional storage system, for really obvious
reasons.  However, it's starting to look like we have some performance
bottlenecks here, that might push us that way.

In particular, consider the problem of reconstructing bitstrings
given a bunch of xdelta's (basically one-way patches) and full texts;
a typical problem would be to calculate a chain of deltas that applied
one-by-one to a full text will give the desired text, then pull those
deltas and the full text out of the db and apply them.  We just store
these text objects as columns in two tables.

In its current implementation in monotone, this algorithm seems to be
seek-bound.  Some profiling shows that there are cases where we're
spending more time blocked waiting for read()s for the db, than it
takes to read the entire db 5 times over.  This makes sense.  We're
basically doing random reads scattered all over the file, so the OS
has no way to do any sort of readahead, which means we're probably
hitting disk seek latency on every read, and like usual these days,
latency trumps bandwidth.

So, the question is how to fix this.  Some (well, one) of our
competitors use a custom file format tweaked to minimize this disk
latency.  Like I said, we'd rather not go this way if we can avoid it;
single-file storage is _so_ nice to have.  But, while obviously our
current mechanism is sub-optimal -- we neither write out the deltas in
a helpful order, or index them in any useful way, so there's no way
SQLite _could_ optimize for our access pattern even if it wanted to --
I have no idea what sort of tricks I could use to help SQLite help me.
How does SQLite choose where to put new records on disk, and is there
any way I can give it useful hints?

-- Nathaniel

P.S.: Sorry for the over-long message :-).

-- 
"Of course, the entire effort is to put oneself
 Outside the ordinary range
 Of what are called statistics."
  -- Stephan Spender

Reply via email to