Hi Eric,
On Jul 7, 2009, at 3:43 AM, eric casteleijn wrote:
Adam Kocoloski wrote:
On Jul 6, 2009, at 3:58 PM, Chris Anderson wrote:
== Deleted and Conflicts==
_all_docs_by_seq includes a 'deleted' flag and a list of
'conflicts'.
Should the _changes API to do the same?
The plan is to drive replication from changes, so anything needed by
replication is on the roadmap. I don't think it'd hurt to have any
of
those but Damien would be better to answer this one.
The deleted=true flag probably won't be needed by the replicator,
because the _changes feed includes the deletion revid. I expect
that the replicator will just download this revision like any
other, find the _deleted:true bit set in the document, and delete
the document on the target.
Note that replication is an important user of _changes, but by no
means the only one, if update notifiers go away. (Which I think
everyone agrees would be a good idea.) I would like to have the
option to not only see when a document was deleted, but in addition
when one was first created on the node in question, which in my
application would require special action, over and above what needs
to happen for an update to a document.
I can see the motivation behind adding created and deleted flags to
the "changeset" included in each row. I certainly didn't mean to
imply that the replicator was the only consumer of the _changes feed.
_conflicts and _deleted_conflicts are more interesting. When one
of these occurs, the document shows up in the _changes feed, but
the revision in that row is the latest revision of the document,
not the conflict/deleted_conflict rev. Unlike _all_docs_by_seq,
it's not possible for the replicator to determine the list of
revisions to replicate solely by analyzing the _changes feed.
I think the most efficient solution is to start including conflict
and deleted_conflict revisions in the revlist in the _changes row.
I don't know the revision tree well enough to know if it's possible
to identify the set of all conflict revisions that were saved after
update_sequence N, but if it is that would be a neat restriction.
This sounds like a good idea.
Another option might be to configure a metadata-only request so
that the replicator could check what revisions exist on the source
for each updated document. Could be a useful thing to have in
general.
And so does this.
What I would like to add:
Right now the _changes feeds are per db, and while that is great in
some use cases (like replication.) In others, where there are many
thousands of databases, one global _changes feed would be much more
practical. It is also how the current update notifiers work, so not
having this option would break existing applications, at least it
would mine. ;)
Adding a global _changes feed seems useful and a very easy thing to
implement from my end.
One last thing that would be great to have is a way to configure
what information goes into a particular _changes feed or an option
to write your own _changes-like feeds in javascript, like you could
a view, so that one could have a feed of changes to the values of a
particular field, for instance. It would make it so that processes
that act on updates from the db never have to query back into the db
for additional data, which could be a performance win. (That is if
the configurable _changes feeds aren't too much of a performance
loss.)
I've also filed a feature request in JIRA with these suggestions:
https://issues.apache.org/jira/browse/COUCHDB-390
Cool, thanks! We can move the discussion there. Best,
Adam
but more discussion, here or on that issue, is most welcome.
--
- eric casteleijn
http://www.canonical.com