[GitHub] davisp opened a new pull request #1370: [5/5] Clustered Purge Implementation

GitBox Fri, 01 Jun 2018 10:44:19 -0700

davisp opened a new pull request #1370: [5/5] Clustered Purge Implementation
URL: https://github.com/apache/couchdb/pull/1370
 
 
   ## Overview
   
   This PR implements clustered purge. Its big.
   
   There are roughly five large chunks for clustered purge:
   
   1. Single shard APIs
   2. Updates to couch_mrview
   3. Internal replication
   4. Read-repair
   5. Clustered API
   
   At the shard level the major changes are to add two new indexes for purge 
requests. These indexes store the history of purge requests to a shard (rather 
than the old method of just storing the most recent purge). This allows for 
secondary indices and internal replication to manage their eventual 
consistency. This is the bulk of the PR in that its adding both the 
implementation and a whole bunch of tests for clustered purge.
   
   One note on naming internally and in commit notes is that a "purge request" 
is what's sent in a single HTTP request to the purge end point. A "purge info" 
is a single `{DocId, Revs}` entry. The new indexes in the shard deal with purge 
infos as they lose any notion of bundling from which request they came in after 
they're stored.
   
   The changes to couch_mrview are fairly straightforward after the single node 
API changes. The only thing that will be interesting in this commit is how we 
use a _local doc to track where the secondary index has processed in the purge 
sequence. This is necessary so that compaction knows when it can discard purge 
infos.
   
   Internal replication is fairly straightforward. This update just ensures 
that we synchronize our purge infos between each shard so that we don't 
inadvertently undo a purge request when synchronizing shards.
   
   Read-repair is somewhat tricky. The thing to note here is that when we open 
a document we track which revisions came from which nodes. Then if read-repair 
is required we pass that information along with the update request. This way a 
node can filter out requests from any node that is either a) not up to date 
with its synchronization or b) we have explicitly purged a revision that has 
not been sent to the node from whence the revision came back. That last bit of 
logic may sound kind of odd but the thing to remember is that we could 
completely remove a revision from a cluster and then re-introduce it later on. 
This means that we have to reject re-application of a revision for a closed 
period of time (rather than rejecting it forever).
   
   The clustered API is rather straightforward for anyone familiar any of the 
existing fabric coordinator and HTTPD handler code.
   
   ## Testing recommendations
   
   `make check`
   
   Clustered purge comes with some fairly extensive testing though I'm very 
much open for suggestions on new tests.
   
   ## Related Issues or Pull Requests
   
   This PR depends on:
   
   #1366 
   #1367 
   #1368 
   #1369
   
   ## Checklist
   
   - [x] Code is written and works correctly (hopefully ;)
   - [x] Changes are covered by tests;
   - [ ] Documentation reflects the changes;
   
   For docs this doesn't change the existing purge APIs but I will need to open 
a PR against the docs to write up some of the caveats and footgun warnings that 
we want to add for this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] davisp opened a new pull request #1370: [5/5] Clustered Purge Implementation

Reply via email to