In the discussion of adding metadata to a bunch of files Christian points
out that you can both limit queries to directories within a single
database or apply a query to multiple databases.

My question: when or why would you prefer one approach over the other?

In my case I'm using BaseX to reflect the XML contents of git
repositories. My current approach is to create a separate database for
each repo/branch pair, my reasoning being that that makes it easiest to
limit queries to just that branch. Because the BaseX data is intended to
be a read-only reflecting of the git-managed source, it also makes it easy
to clear the data for a branch if it's gotten out of sync (or I suspect
it's gotten out of sync) by simply dropping the database.

I have complete control over the queries (through a library of functions
that understand the git nature of the databases), so I could just as
easily use a single database with subdirectories that reflect the repos
and branches.

In this scenario, as an example, is there any compelling reason to use one
approach or the other?

I like having one database per branch because that seems like a natural
mapping that generally keeps things simple and more or less obvious (e.g.,
doing "list" will show the list of databases, which reflect the repo and
branch names in their names).

In this application the scale will usually be relatively small: 1000s or
10s of 1000s of individual documents in any given branch but the querying
and indexing, which supports maintaining knowledge of the links within the
XML content, could get intense.

Cheers,

Eliot 

—————
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com



Reply via email to