(Posted to -dev because it has some development issues)
This is wrong BTW:
elsif doc["Type"] == "user"
doc["Roles"] && doc["Roles"].each do |r|
db.execute("replace into links values (?, ?, ?)",
db_name, doc_id, r);
end
because it doesn't handle modifications correctly. In my production
code I do this:
db.execute("delete from links where db = ? and src = ?", db_name,
doc_id);
doc["Roles"] && doc["Roles"].each do |r|
db.execute("insert into links values (?, ?, ?)", db_name, doc_id,
r);
end
i.e. always delete and recreate the derived document. You can do
incremental updates by reading from your indexes before updating. You
cannot reliably get the previous rev (for differencing) because it may
not exist.
My code also doesn't handle a database being deleted and then re-
created - the _external will think it has valid records, but they
belong to a previous database. You could do that through
notifications, but once again I think it needs to be synchronous if
you want to reason about it. A likely-to-work-most-of-the-time
solution would be to detect update_seq < stored_update_seq. A better
solution would be for each db to have a UUID, so that you don't have
to rely on the name as the identity.
Also, if your _external doesn't get triggered for a long time, and
while it's 'dormant' a document is deleted and the db is compacted,
you could miss deletions. One solution to that is that every _external
needs to be notified (synchronously) before a compaction so that it
can update to the update_seq of the MVCC snapshot that the compaction
will operate against. IMO a better solution is to have two UUID's for
the database - one is per database, and one is 'per compaction'. Thus
an external will know if it needs to revalidate all the documents it
has indexed to check for missed deletions updates. You could just have
a per-compaction UUID, which would change if a db was deleted and then
created, this triggering the same codepath, but this is a lot more
expensive than knowing that the entire db
Finally, note that this external operates for *every* database,
whereas you may want to enable and configure it using a design
document. Thus your external should always monitor updated design
documents and check for enablement. You can record the configuration
in the database (and cache it in the _external) and just ignore all
other changes. Personally I don't bother because the lazy-creation
means that no work is done unless I do an _external query, so
databases which don't get queried, don't incur a cost, and I have no
configuration data.
That's another reason to prefer a passive UUID-based identity scheme
for db-create/delete and compaction detection rather than a
notification system.
It would be good if each DB had two UUIDs, one per-db and one per-
compaction i.e. changed in the MVCC snapshot during a compaction, and
that these be provided to every _external request.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
If at first you don’t succeed, try, try again. Then quit. No use being
a damn fool about it
-- W.C. Fields