(Posted to -dev because it has some development issues)

This is wrong BTW:

       elsif doc["Type"] == "user"
         doc["Roles"] && doc["Roles"].each do |r|
db.execute("replace into links values (?, ?, ?)", db_name, doc_id, r);
         end

because it doesn't handle modifications correctly. In my production code I do this:

db.execute("delete from links where db = ? and src = ?", db_name, doc_id);
  doc["Roles"] && doc["Roles"].each do |r|
db.execute("insert into links values (?, ?, ?)", db_name, doc_id, r);
  end

i.e. always delete and recreate the derived document. You can do incremental updates by reading from your indexes before updating. You cannot reliably get the previous rev (for differencing) because it may not exist.

My code also doesn't handle a database being deleted and then re- created - the _external will think it has valid records, but they belong to a previous database. You could do that through notifications, but once again I think it needs to be synchronous if you want to reason about it. A likely-to-work-most-of-the-time solution would be to detect update_seq < stored_update_seq. A better solution would be for each db to have a UUID, so that you don't have to rely on the name as the identity.

Also, if your _external doesn't get triggered for a long time, and while it's 'dormant' a document is deleted and the db is compacted, you could miss deletions. One solution to that is that every _external needs to be notified (synchronously) before a compaction so that it can update to the update_seq of the MVCC snapshot that the compaction will operate against. IMO a better solution is to have two UUID's for the database - one is per database, and one is 'per compaction'. Thus an external will know if it needs to revalidate all the documents it has indexed to check for missed deletions updates. You could just have a per-compaction UUID, which would change if a db was deleted and then created, this triggering the same codepath, but this is a lot more expensive than knowing that the entire db

Finally, note that this external operates for *every* database, whereas you may want to enable and configure it using a design document. Thus your external should always monitor updated design documents and check for enablement. You can record the configuration in the database (and cache it in the _external) and just ignore all other changes. Personally I don't bother because the lazy-creation means that no work is done unless I do an _external query, so databases which don't get queried, don't incur a cost, and I have no configuration data.

That's another reason to prefer a passive UUID-based identity scheme for db-create/delete and compaction detection rather than a notification system.

It would be good if each DB had two UUIDs, one per-db and one per- compaction i.e. changed in the MVCC snapshot during a compaction, and that these be provided to every _external request.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

If at first you don’t succeed, try, try again. Then quit. No use being a damn fool about it
  -- W.C. Fields

Reply via email to