Re: Partial replication -orelse- sending interpreted data to another server

Damien Katz Mon, 16 Feb 2009 10:03:29 -0800

The problem with b) admin-enforced replication policies is that it'snot really possible. The replicator is just an agent of the user whoinvoked it, it can choose to follow some rules set by the admin, or itfollows it's own rules. You can't give a user access to the database,but enforce that they can only replicate it the admin specified way.If the user can perform a certain update in the database using regularmethods, he can also do so via the replicator.

Therefore the answer is to not distinguish between replicated updatesand direct updates. Instead enforce same security rules either way.This user can update this document with these values, or he can't.Doesn't matter if it's replicated or direct.

This, like much of CouchDB, is very much inspired by how Lotus Notesalready works. Notes does partial replication, has signed designelements and scripts and cryptographically verifiable users. The Notesmodel treats the user who replicates the update as the person isperforming the update. The Notes security model is more rigid andthoroughly integrated than what I plan. CouchDB instead will provideall the hooks necessary to build a Notes like security system, butwill actually have more flexibility here, as we can customize thesecurity model more.

If you are worried about runaway scripts and scripts that use too manyresources, then the only real option is to provide a non-turingequivalent query language, or limit the code to a subset of thelanguage. But even though you'll know it terminates, it's hard tolimit how expensive the operations are and how often it's invoked.There is never really a good option here, anything constrained enoughto make time/space guarantees is often too limited to be useful.Timeouts suck, but so does everything else.


-Damien


On Feb 16, 2009, at 11:54 AM, Martin Scholl wrote:

Hello all,


at #couchdb we discussed how partial replication could be implemented.
We discussed pros and cons, with davisp requesting I should write an
email to d...@. Well, here it is...

===

Basically, 2 approaches to replication were discussed (names freely
added, please substitute with more appropriate ones):

a) "Push-pull scenario":
client wishing to get some documents replicated, sends to the
replicating server a design doc with a predicate in it. The predicate
determines which docs are to be replicated ("pull replication")

b) "Pull-pull scenario":
- a DB admin adds a set of design docs which a client then triggers to
retrieve the the docs/the set of docids.

While a) is way more versatile, variant b) leaves the admin with more
control over what happens with his/her database.

My concern with a) is
- it breaks with the principle "payload is payload, and code is code",
- it opens the door to several dos attacks. Imagine a predicate doing
while(1) {}. Setting per doc timeout to a low value (as Jan suggested)
doesn't really solve the cpu hogging issue.

So, this all boils down to the questions:
- what principles for selective replication should be employed?

- how can we establish a system of trust for foreign java scripts?(e.g.

code-signing and all that stuff)
- is the solution for all this "make the replication regime a db
configuration option"?

Although it would break with several of CouchDB's traditions, asolutionthat is secure and versatile could be a descriptive approach.Something

like this (simplified json):

replicate: {
 input: {
    <Param1>: { type: int; default: 42; },
    <Param2>: { type: string; default: "wiki/"}
 }

 filter: {
    <set of filters>
 }
};

with a filter being recursively defined:
 filter: {
   type: <and,or,xor,not;
   filter: <recursive definition of a filter>
 }

With the filter-family and,or,xor,not describing how the recursive
sub-filters should be composed, and a 2nd type of filter:

filter: {
 type: match;
 filter: <Json struct>
}

with <Json struct> being any json object which may embody constructs
'$<Param1>$' which are dynamically substituted with the script's input
variables.
With such a descriptive filter description we can match
a) several types of documents by using an or-filter together with
several sub-filters which are match-filters
b) more importantly: we can start reasoning on the filters and enforce

several security constraints (e.g. max. filter depth: 3, only orfilters

allowed, only 2 match filters allowed, etc.).

I would like to hear what you think about all the differentapproaches.



Martin

P.S.: Again, sorry for not being able to provide code. The solepurpose

of this email is to document some thoughts and others' ideas.

Re: Partial replication -orelse- sending interpreted data to another server

Reply via email to