Adam Kocoloski wrote:
On Jul 6, 2009, at 3:58 PM, Chris Anderson wrote:
== Deleted and Conflicts==
_all_docs_by_seq includes a 'deleted' flag and a list of 'conflicts'.
Should the _changes API to do the same?
The plan is to drive replication from changes, so anything needed by
replication is on the roadmap. I don't think it'd hurt to have any of
those but Damien would be better to answer this one.
The deleted=true flag probably won't be needed by the replicator,
because the _changes feed includes the deletion revid. I expect that
the replicator will just download this revision like any other, find the
_deleted:true bit set in the document, and delete the document on the
target.
Note that replication is an important user of _changes, but by no means
the only one, if update notifiers go away. (Which I think everyone
agrees would be a good idea.) I would like to have the option to not
only see when a document was deleted, but in addition when one was first
created on the node in question, which in my application would require
special action, over and above what needs to happen for an update to a
document.
_conflicts and _deleted_conflicts are more interesting. When one of
these occurs, the document shows up in the _changes feed, but the
revision in that row is the latest revision of the document, not the
conflict/deleted_conflict rev. Unlike _all_docs_by_seq, it's not
possible for the replicator to determine the list of revisions to
replicate solely by analyzing the _changes feed.
I think the most efficient solution is to start including conflict and
deleted_conflict revisions in the revlist in the _changes row. I don't
know the revision tree well enough to know if it's possible to identify
the set of all conflict revisions that were saved after update_sequence
N, but if it is that would be a neat restriction.
This sounds like a good idea.
Another option might be to configure a metadata-only request so that the
replicator could check what revisions exist on the source for each
updated document. Could be a useful thing to have in general.
And so does this.
What I would like to add:
Right now the _changes feeds are per db, and while that is great in some
use cases (like replication.) In others, where there are many thousands
of databases, one global _changes feed would be much more practical. It
is also how the current update notifiers work, so not having this option
would break existing applications, at least it would mine. ;)
One last thing that would be great to have is a way to configure what
information goes into a particular _changes feed or an option to write
your own _changes-like feeds in javascript, like you could a view, so
that one could have a feed of changes to the values of a particular
field, for instance. It would make it so that processes that act on
updates from the db never have to query back into the db for additional
data, which could be a performance win. (That is if the configurable
_changes feeds aren't too much of a performance loss.)
I've also filed a feature request in JIRA with these suggestions:
https://issues.apache.org/jira/browse/COUCHDB-390
but more discussion, here or on that issue, is most welcome.
--
- eric casteleijn
http://www.canonical.com