Partial replication -orelse- sending interpreted data to another server

Martin Scholl Mon, 16 Feb 2009 08:55:05 -0800

Hello all,


at #couchdb we discussed how partial replication could be implemented.
We discussed pros and cons, with davisp requesting I should write an
email to d...@. Well, here it is...

===

Basically, 2 approaches to replication were discussed (names freely
added, please substitute with more appropriate ones):

a) "Push-pull scenario":
client wishing to get some documents replicated, sends to the
replicating server a design doc with a predicate in it. The predicate
determines which docs are to be replicated ("pull replication")

b) "Pull-pull scenario":
- a DB admin adds a set of design docs which a client then triggers to
retrieve the the docs/the set of docids.

While a) is way more versatile, variant b) leaves the admin with more
control over what happens with his/her database.

My concern with a) is
- it breaks with the principle "payload is payload, and code is code",
- it opens the door to several dos attacks. Imagine a predicate doing
while(1) {}. Setting per doc timeout to a low value (as Jan suggested)
doesn't really solve the cpu hogging issue.

So, this all boils down to the questions:
- what principles for selective replication should be employed?
- how can we establish a system of trust for foreign java scripts? (e.g.
code-signing and all that stuff)
- is the solution for all this "make the replication regime a db
configuration option"?

Although it would break with several of CouchDB's traditions, a solution
that is secure and versatile could be a descriptive approach. Something
like this (simplified json):

replicate: {
  input: {
     <Param1>: { type: int; default: 42; },
     <Param2>: { type: string; default: "wiki/"}
  }

  filter: {
     <set of filters>
  }
};

with a filter being recursively defined:
  filter: {
    type: <and,or,xor,not;
    filter: <recursive definition of a filter>
  }

With the filter-family and,or,xor,not describing how the recursive
sub-filters should be composed, and a 2nd type of filter:

filter: {
  type: match;
  filter: <Json struct>
}

with <Json struct> being any json object which may embody constructs
'$<Param1>$' which are dynamically substituted with the script's input
variables.
With such a descriptive filter description we can match
a) several types of documents by using an or-filter together with
several sub-filters which are match-filters
b) more importantly: we can start reasoning on the filters and enforce
several security constraints (e.g. max. filter depth: 3, only or filters
allowed, only 2 match filters allowed, etc.).

I would like to hear what you think about all the different approaches.


Martin

P.S.: Again, sorry for not being able to provide code. The sole purpose
of this email is to document some thoughts and others' ideas.

Partial replication -orelse- sending interpreted data to another server

Reply via email to