Implementing the data directory (was Re: Fwd: About Berkeley DB)

Rob Browning Mon, 15 May 2000 20:07:45 -0700
Alessio Bragadini <[EMAIL PROTECTED]> writes:

> Since it's closely related to the discussion we've had about a
> proper DB backend, I am forwarding this lengthy message from the
> PostgreSql development mailing list.

Very interesting.  I hadn't realized that their db had so much
functionality these days.  For those interested, see these two
pointers:

  http://www.sleepycat.com/docs/ref/intro/what.html
  http://www.sleepycat.com/docs/ref/intro/do.html

I've been working a bit on a "filestore API".  This is to allow us to
switch to storing all the data associated with a set of accounts as a
subdirectory rather than a single file.  We'd have one file in there
that stores the engine data that's in our current single data file,
and then we'd have other files for other things.

However, after poking around in the db docs (from now on, when I say
db here, I mean the sleepycat db), I'm wondering if this might be much
better candidate for that job than "rolling our own" fs subtree
approach.  For now, I'm putting aside the question of whether or not
we might want to break out the current engine data as db tables.  For
now I'm just interested in considering if using db might be the best
way to give us the "sectional data store".

Things to think about:

  - db supports a "subname" argument in the "open" call, so you can
    have multiple db tables inside one file.  With even marginally
    careful subnaming, we can use the db for all our "sectional
    subfiles" right now, and still be able to break out the engine
    data into tables later if we decide it's a good idea.

  - db appears to have industrial strength locking, transactions, and
    logging with full recoverability if requested.

  - db has dump and load to/from text format file routines, so people
    interested in a accessing their data in a text format will have a
    well documented method.

  - has a threaded interface with an optional daemon for deadlock
    detection and resolution (probably not that useful for us for a
    long time).

  - db supports multiple readers/writers.

  - db supports hot backups using normal unix tools.  You can just
    tar/cp/whatever your file while its open with no ill effects as
    long as the app properly uses the built in transaction mechanism.

  - db seems to be extremely portable, which might be a substantial
    bonus if we ever want to whip up a tiny embedded client for
    handhelds or whatever.

  - db already has perl, python, tcl, java, c, and C++ interfaces.

So, now that I've listed a bunch of pluses, I'm sure there are people
out there with more experience than me with this db.  What are the
minuses?

Also, though this probably belongs as a separate discussion, and I
haven't thought heavily about it yet, what about actually using it as
our engine data storage format (i.e. splits as a subtable,
transactions as a subtable, accounts as a subtable, etc.)?  It's not
as sophisticated as SQL (though that's possibly a plus as much as a
minus) since it's only a one dimensional hash, but you can build
anything you want on top of that, and, as compared to our current
approach, it seems like it might be better across the board.
Of course, this question is orthogonal to whether or not it's the
right thing to use to implement the "sectional data store" for
non-engine data.

The nice thing is that if we do decide for now to just use it for our
non-engine data (where the engine data is just a binary "blob" stored
as one of the subnames in the database file), it should be a very
straightforward step to break out the engine data into sub-tables
later if that turns out to be a good idea.

Thoughts?

(Bracing for a long thread...)

-- 
Rob Browning <[EMAIL PROTECTED]> PGP=E80E0D04F521A094 532B97F5D64E3930
Implementing the data directory (was Re: Fwd: About Berkeley DB)

Reply via email to