/_all_dbs and security

Filipe David Manana Sat, 27 Feb 2010 10:32:36 -0800

Dear devs,

Currently, the URI handler for /_all_dbs just lists, recursively, all the db
files in the database dir (parameter database_dir of the .ini file).


Since he have now a _security object per DB (I dunno why it's not a regular
doc) which allows to restrict access to each DB, that code is no longer
fair. It makes sense that this handler just returns a list of the DBs an
user has access to.

It's through this URI that for example Futon lists the available DBs.

There's a ticket for this: https://issues.apache.org/jira/browse/COUCHDB-661

That solution is acceptable if the number of DBs in the server is "just" up
to about 10 000 or so. I tested with 7500 DBs, each occupying about 1Mb and
having 100 docs, and the response time for _all_dbs was about 4 seconds
(more details in the comments of that ticket).

The problem is that for each DB file found, one has to read its header and
then read its _security object to figure out if the session user can access
that DB. Therefore, we have 2 disk read operations for each DB file. 1
million DBs would imply 2 million disk reads.

Obviously an efficient solution for this would be to have a view which maps
users to DBs. I have an incomplete idea for this.
What I thought about is the following:

1) Having a special db, named "_dbs" (for example) which would contain meta
information about every available DB (like the meta tables in Oracle, SQL
Server, and so on).

2) That DB would contain a doc for each available DB. Each doc would contain
the reader names and roles associated to the corresponding DB (this is the
only kind of info we need for _all_dbs)

3) We would have a view, like Brian Candler suggested in a comment to that
ticket, that emits keys like:
    emit(['name',name],db)
    emit(['role',role],db)

4) For DBs with a _security object having empty lists for both the reader
names and reader roles, we would emit the special role "_public" for example

5) Whenever the _security object of a DB is updated, we would update the
corresponding reader names and roles in the _dbs DB.

I though of some issues (for which I don't have a solution) :

1)  If a user just copies DB files from elsewhere (another server or a
backup for e.g.) into the DBs directory, how do we detect them? Scanning for
all DB files at startup and taking proper action would be potentially slow.
Also, if a DB file is copied while CouchDB is running, I dunno how to detect
it. The only idea I have now is: Every time a DB file is opened (due to a
user request), we check if _dbs has a corresponding entry and if not we take
proper action

2) If a user deletes a DB file manually (i.e. rm db_file.couch), how to
detect it and remove the corresponding entry in _dbs?

3) If a user restores a DB file backup containing an old _security object,
we need to detect that and update the entry in _dbs. A way to do this would
be to store the DB update seq number in the corresponding doc at _dbs and
then using the same idea as in 1)

These are very preliminary ideas.

I would like to collect suggestions from all of you on how to implement this
efficiently and know if you can point out any other problems I haven't
thought about.

thanks

best regards,

-- 
Filipe David Manana,
[email protected]
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

/_all_dbs and security

Reply via email to