Dear devs, Currently, the URI handler for /_all_dbs just lists, recursively, all the db files in the database dir (parameter database_dir of the .ini file).
Since he have now a _security object per DB (I dunno why it's not a regular doc) which allows to restrict access to each DB, that code is no longer fair. It makes sense that this handler just returns a list of the DBs an user has access to. It's through this URI that for example Futon lists the available DBs. There's a ticket for this: https://issues.apache.org/jira/browse/COUCHDB-661 That solution is acceptable if the number of DBs in the server is "just" up to about 10 000 or so. I tested with 7500 DBs, each occupying about 1Mb and having 100 docs, and the response time for _all_dbs was about 4 seconds (more details in the comments of that ticket). The problem is that for each DB file found, one has to read its header and then read its _security object to figure out if the session user can access that DB. Therefore, we have 2 disk read operations for each DB file. 1 million DBs would imply 2 million disk reads. Obviously an efficient solution for this would be to have a view which maps users to DBs. I have an incomplete idea for this. What I thought about is the following: 1) Having a special db, named "_dbs" (for example) which would contain meta information about every available DB (like the meta tables in Oracle, SQL Server, and so on). 2) That DB would contain a doc for each available DB. Each doc would contain the reader names and roles associated to the corresponding DB (this is the only kind of info we need for _all_dbs) 3) We would have a view, like Brian Candler suggested in a comment to that ticket, that emits keys like: emit(['name',name],db) emit(['role',role],db) 4) For DBs with a _security object having empty lists for both the reader names and reader roles, we would emit the special role "_public" for example 5) Whenever the _security object of a DB is updated, we would update the corresponding reader names and roles in the _dbs DB. I though of some issues (for which I don't have a solution) : 1) If a user just copies DB files from elsewhere (another server or a backup for e.g.) into the DBs directory, how do we detect them? Scanning for all DB files at startup and taking proper action would be potentially slow. Also, if a DB file is copied while CouchDB is running, I dunno how to detect it. The only idea I have now is: Every time a DB file is opened (due to a user request), we check if _dbs has a corresponding entry and if not we take proper action 2) If a user deletes a DB file manually (i.e. rm db_file.couch), how to detect it and remove the corresponding entry in _dbs? 3) If a user restores a DB file backup containing an old _security object, we need to detect that and update the entry in _dbs. A way to do this would be to store the DB update seq number in the corresponding doc at _dbs and then using the same idea as in 1) These are very preliminary ideas. I would like to collect suggestions from all of you on how to implement this efficiently and know if you can point out any other problems I haven't thought about. thanks best regards, -- Filipe David Manana, [email protected] PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
