The problem with b) admin-enforced replication policies is that it's not really possible. The replicator is just an agent of the user who invoked it, it can choose to follow some rules set by the admin, or it follows it's own rules. You can't give a user access to the database, but enforce that they can only replicate it the admin specified way. If the user can perform a certain update in the database using regular methods, he can also do so via the replicator.

Therefore the answer is to not distinguish between replicated updates and direct updates. Instead enforce same security rules either way. This user can update this document with these values, or he can't. Doesn't matter if it's replicated or direct.

This, like much of CouchDB, is very much inspired by how Lotus Notes already works. Notes does partial replication, has signed design elements and scripts and cryptographically verifiable users. The Notes model treats the user who replicates the update as the person is performing the update. The Notes security model is more rigid and thoroughly integrated than what I plan. CouchDB instead will provide all the hooks necessary to build a Notes like security system, but will actually have more flexibility here, as we can customize the security model more.

If you are worried about runaway scripts and scripts that use too many resources, then the only real option is to provide a non-turing equivalent query language, or limit the code to a subset of the language. But even though you'll know it terminates, it's hard to limit how expensive the operations are and how often it's invoked. There is never really a good option here, anything constrained enough to make time/space guarantees is often too limited to be useful. Timeouts suck, but so does everything else.

-Damien


On Feb 16, 2009, at 11:54 AM, Martin Scholl wrote:

Hello all,


at #couchdb we discussed how partial replication could be implemented.
We discussed pros and cons, with davisp requesting I should write an
email to d...@. Well, here it is...

===

Basically, 2 approaches to replication were discussed (names freely
added, please substitute with more appropriate ones):

a) "Push-pull scenario":
client wishing to get some documents replicated, sends to the
replicating server a design doc with a predicate in it. The predicate
determines which docs are to be replicated ("pull replication")

b) "Pull-pull scenario":
- a DB admin adds a set of design docs which a client then triggers to
retrieve the the docs/the set of docids.

While a) is way more versatile, variant b) leaves the admin with more
control over what happens with his/her database.

My concern with a) is
- it breaks with the principle "payload is payload, and code is code",
- it opens the door to several dos attacks. Imagine a predicate doing
while(1) {}. Setting per doc timeout to a low value (as Jan suggested)
doesn't really solve the cpu hogging issue.

So, this all boils down to the questions:
- what principles for selective replication should be employed?
- how can we establish a system of trust for foreign java scripts? (e.g.
code-signing and all that stuff)
- is the solution for all this "make the replication regime a db
configuration option"?

Although it would break with several of CouchDB's traditions, a solution that is secure and versatile could be a descriptive approach. Something
like this (simplified json):

replicate: {
 input: {
    <Param1>: { type: int; default: 42; },
    <Param2>: { type: string; default: "wiki/"}
 }

 filter: {
    <set of filters>
 }
};

with a filter being recursively defined:
 filter: {
   type: <and,or,xor,not;
   filter: <recursive definition of a filter>
 }

With the filter-family and,or,xor,not describing how the recursive
sub-filters should be composed, and a 2nd type of filter:

filter: {
 type: match;
 filter: <Json struct>
}

with <Json struct> being any json object which may embody constructs
'$<Param1>$' which are dynamically substituted with the script's input
variables.
With such a descriptive filter description we can match
a) several types of documents by using an or-filter together with
several sub-filters which are match-filters
b) more importantly: we can start reasoning on the filters and enforce
several security constraints (e.g. max. filter depth: 3, only or filters
allowed, only 2 match filters allowed, etc.).

I would like to hear what you think about all the different approaches.


Martin

P.S.: Again, sorry for not being able to provide code. The sole purpose
of this email is to document some thoughts and others' ideas.

Reply via email to