"Steve Krulewitz" <[EMAIL PROTECTED]> wrote:
> Hey all --
> 
> In the application I am working on (Songbird), we have a simple two
> table schema representing the tracks in your music collection and the
> properties on those tracks.  The keys used are all UUIDs (128 bit
> number) which we are currently storing in hex string form in a text
> column, so they literally appear in the database as
> "ee89b823-e709-40c5-bed7-dcb0b2b791a8".  We do lots of lookups and
> joins on these keys.
> 
> I was wondering if there is much to be gained by storing these 128 bit
> values in binary rather than as strings.

Storing the UUIDs as BLOBs rather than as hex-encoded strings
will requires 21 bytes per key instead of 43.  So your indices
will have about twice their current fanout.  That means that
you can expect to roughly double your lookup performance on
a cold cache.  (If the database, or large parts of it, is 
already in your OS cache, the difference will be much less
and will likely not be noticable.)

The downside of using BLOBs is that you cannot see them easily
during debugging when you do a "SELECT * FROM...".  You have
to use constructs like, "SELECT hex(uuid), ... FROM" which is
more typing.  You can fix that with a VIEW, I suppose.

The thing to consider is why you are using 128-bit UUIDs in
the first place?  Presumably you are doing this so that you
can sync and/or merge independently created databases without
having to worry about key collisions.  If you are not doing
syncs or merges or independently generated keys, then I can't
think of a good reason to use 128-bit UUIDs in the first
place.  Just use small integer autogenerated keys from SQLite.

Assuming you are doing syncing and merging, what I tend to
do in these kinds of situations is to create a mapping between
the universally unique ID (UUID) and a small integer ID that
is unique in the local database - call the latter the RID.
The RID is just an integer and can be your INTEGER PRIMARY KEY
for very fast lookups.  Looking up a record by INTEGER PRIMARY KEY
is always twice as fast as looking up the same record by any
other key, and because of high fanout can potentially be much
faster if you have a cold cache.  So I use the RID for joins
or other internal operations and only use the UUID for lookups 
from external data.

For example, your schema might look something like this:

   CREATE TABLE album(
     aid INTEGER PRIMARY KEY,      -- Internally unique album ID
     uuid TEXT UNIQUE,             -- Universally unique album ID
     ...
   );
   CREATE TABLE track(
     tid INTEGER PRIMARY KEY,      -- Internally unique track ID
     uuid TEXT UNIQUE,             -- Universally unique track ID
     aid INTEGER REFERENCES album, -- Album containing this track
     ...
   );
   CREATE INDEX track_album ON track(aid);

A typical query might be to find all tracks of an album:

   SELECT * FROM track 
   WHERE aid=(SELECT aid FROM album WHERE title=?)

And queries like this will run much faster using INTEGER
PRIMARY KEYS rather than UUIDs.

All that said, for even the largest music collection, your
database is unlikely to have more than a few thousand albums
and a few tens of thousands of tracks, and with such a small
database and running on a modern workstations, probably just
about anything you do is going to be fast enough.  The
optimizations described above are useful if you have tables
with millions of entries or if you are doing thousands
of queries per second or if you are running on some woefully 
underpowered portable music player.  But for a personal
music library running on a workstation at human-interaction
speed, use whatever schema makes it easiest for you to type 
in correct and working code.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to