Hello all,
at #couchdb we discussed how partial replication could be implemented.
We discussed pros and cons, with davisp requesting I should write an
email to d...@. Well, here it is...
===
Basically, 2 approaches to replication were discussed (names freely
added, please substitute with more appropriate ones):
a) "Push-pull scenario":
client wishing to get some documents replicated, sends to the
replicating server a design doc with a predicate in it. The predicate
determines which docs are to be replicated ("pull replication")
b) "Pull-pull scenario":
- a DB admin adds a set of design docs which a client then triggers to
retrieve the the docs/the set of docids.
While a) is way more versatile, variant b) leaves the admin with more
control over what happens with his/her database.
My concern with a) is
- it breaks with the principle "payload is payload, and code is code",
- it opens the door to several dos attacks. Imagine a predicate doing
while(1) {}. Setting per doc timeout to a low value (as Jan suggested)
doesn't really solve the cpu hogging issue.
So, this all boils down to the questions:
- what principles for selective replication should be employed?
- how can we establish a system of trust for foreign java scripts? (e.g.
code-signing and all that stuff)
- is the solution for all this "make the replication regime a db
configuration option"?
Although it would break with several of CouchDB's traditions, a solution
that is secure and versatile could be a descriptive approach. Something
like this (simplified json):
replicate: {
input: {
<Param1>: { type: int; default: 42; },
<Param2>: { type: string; default: "wiki/"}
}
filter: {
<set of filters>
}
};
with a filter being recursively defined:
filter: {
type: <and,or,xor,not;
filter: <recursive definition of a filter>
}
With the filter-family and,or,xor,not describing how the recursive
sub-filters should be composed, and a 2nd type of filter:
filter: {
type: match;
filter: <Json struct>
}
with <Json struct> being any json object which may embody constructs
'$<Param1>$' which are dynamically substituted with the script's input
variables.
With such a descriptive filter description we can match
a) several types of documents by using an or-filter together with
several sub-filters which are match-filters
b) more importantly: we can start reasoning on the filters and enforce
several security constraints (e.g. max. filter depth: 3, only or filters
allowed, only 2 match filters allowed, etc.).
I would like to hear what you think about all the different approaches.
Martin
P.S.: Again, sorry for not being able to provide code. The sole purpose
of this email is to document some thoughts and others' ideas.