[ https://issues.apache.org/jira/browse/COUCHDB-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774411#comment-13774411 ]
Stéphane Alnet commented on COUCHDB-1893: ----------------------------------------- > `old_doc` may not exist after compaction Argh, I mostly use continuous replication so didn't think about that. > Instead of this, have you thought about adding very trivial logic to your > filter functions to accept or discard documents with `_deleted:true` if some > special query argument or header exists? Since the query arguments/headers are shared for all documents I don't see how this can be used to decide to filter some deleted documents and not others? (Besides having the HTTP client pre-compute a list of IDs that need replicated, but then the replication filter is almost useless.) Typically I provide query arguments to replication filters and match those arguments to fields in the document body. So my filter function might look like this: [ Quoting from https://github.com/shimaore/ccnq3/blob/master/applications/host/couchapps/main.coffee#L46 ] ddoc.filters.local_rules = p_fun (doc,req) -> # Always replicate deletions if doc._deleted? and doc._deleted return true # Only replicate provisioning documents. if not doc.type? return false if doc.type is 'rule' return doc.sip_domain_name is req.query.sip_domain_name if doc._id.match /^_design/ return false return true (In this example all deletions get replicated out of a very large database, while the normal record set is pretty small.) Trying to summarize: - We could expand the filter API as I mentioned but this would break on compaction. - We could expand the compaction to keep N documents instead of only the last one; but then we're asking for a double change in APIs. - Use PUT+`_deleted:true` and keep simple filters in place; deletions do not occur if the fields are not present in the deleted document; DELETE command should not be used in those cases. For now I'll document the workaround and why it is our current best option on the Wiki. > Allow replication filters to meaningfully apply to deleted documents > -------------------------------------------------------------------- > > Key: COUCHDB-1893 > URL: https://issues.apache.org/jira/browse/COUCHDB-1893 > Project: CouchDB > Issue Type: Improvement > Components: JavaScript View Server > Reporter: Stéphane Alnet > > A document that is deleted using the DELETE command will be presented to a > replication filter as an empty record with only a `_deleted:true` field. A > replication filter can then only use the document ID to decide whether or not > to propagate the deletion; in most cases this is not sufficient, and one may > have to pass along deletion documents for IDs that would not have been > replicated by the filter. > This might lead to document IDs being leaked to the target database, which > might be undesirable; more importantly if the goal of filtering was to build > a smaller subset of the source database (for example to replicate a very > large database to a device that has smaller storage space), those deletion > documents might overfill the database (they never get compacted). > I had somewhat documented this issue on the Wiki > (http://wiki.apache.org/couchdb/Replication#Filtered_Replication) a while > back but never got to add it to JIRA. > Dave Cottlehuber on the PouchDB list suggested to use PUT with a > `_deleted:true` field to work around the problem (the PUT body can then > contain data sufficient to enable the filter to work). However we're still > stuck in case DELETE was used instead. > My suggestion is to expand the replication filter API to add an optional > third argument > filter(doc,req,old_doc) > where old_doc if present references the version of the document that will get > deleted. It is then up to the filter to use the _deleted flag in `doc` and > the values in `old_doc`. > (It might be useful/meaningful/easier to add old_doc in all cases; at this > point I'm only suggesting to add it in the case doc contains a _deleted > field.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira