davisp opened a new pull request #1370: [5/5] Clustered Purge Implementation URL: https://github.com/apache/couchdb/pull/1370 ## Overview This PR implements clustered purge. Its big. There are roughly five large chunks for clustered purge: 1. Single shard APIs 2. Updates to couch_mrview 3. Internal replication 4. Read-repair 5. Clustered API At the shard level the major changes are to add two new indexes for purge requests. These indexes store the history of purge requests to a shard (rather than the old method of just storing the most recent purge). This allows for secondary indices and internal replication to manage their eventual consistency. This is the bulk of the PR in that its adding both the implementation and a whole bunch of tests for clustered purge. One note on naming internally and in commit notes is that a "purge request" is what's sent in a single HTTP request to the purge end point. A "purge info" is a single `{DocId, Revs}` entry. The new indexes in the shard deal with purge infos as they lose any notion of bundling from which request they came in after they're stored. The changes to couch_mrview are fairly straightforward after the single node API changes. The only thing that will be interesting in this commit is how we use a _local doc to track where the secondary index has processed in the purge sequence. This is necessary so that compaction knows when it can discard purge infos. Internal replication is fairly straightforward. This update just ensures that we synchronize our purge infos between each shard so that we don't inadvertently undo a purge request when synchronizing shards. Read-repair is somewhat tricky. The thing to note here is that when we open a document we track which revisions came from which nodes. Then if read-repair is required we pass that information along with the update request. This way a node can filter out requests from any node that is either a) not up to date with its synchronization or b) we have explicitly purged a revision that has not been sent to the node from whence the revision came back. That last bit of logic may sound kind of odd but the thing to remember is that we could completely remove a revision from a cluster and then re-introduce it later on. This means that we have to reject re-application of a revision for a closed period of time (rather than rejecting it forever). The clustered API is rather straightforward for anyone familiar any of the existing fabric coordinator and HTTPD handler code. ## Testing recommendations `make check` Clustered purge comes with some fairly extensive testing though I'm very much open for suggestions on new tests. ## Related Issues or Pull Requests This PR depends on: #1366 #1367 #1368 #1369 ## Checklist - [x] Code is written and works correctly (hopefully ;) - [x] Changes are covered by tests; - [ ] Documentation reflects the changes; For docs this doesn't change the existing purge APIs but I will need to open a PR against the docs to write up some of the caveats and footgun warnings that we want to add for this.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
