Emmanuel Lecharny wrote:
Hi guys,
the current implementation of replication, AFAICT, is based on a pull
system. So to speak, each time you modify some entry, a message is sent
to all the replicas, waiting for an ack to be returned.
The main advantage is that we can't be faster: we replicate as soon as
possible.
The main issue is that if a replica is not connected, we will try and
try and try, until the remote server is back.
That can become a serious PITA; it certainly was with OpenLDAP slurpd.
Here are some ideas I rehashed in the train on my way to office those
last two days...
- We should ask the replicas to register to the other servers using a
LDAP extended request
- then the server will push the modifications in a blocking queue for
each replicas
- the blocking queue is read by a thread and the modifications are
stored in a base, and sent to the replicas using a LDAP request, with a
control (Replication)
- the replica receives the modifications as simple LDAP requests, plus a
control, and deal with those, and send back a Ldap response with a
status, allowing the modification to be removed from the store.
- if the replica is disconnected (for any reason), the server does not
send anymore modifications to the replica, until the replica connects again.
- in this case, we simply restart the thread and send all the pending
modifications found in the store and in the queue to the replica.
Likewise, a queue of pending mods can become a serious PITA if the replica is
offline for a long time.
See section 17.2.4.1 "Replacing Slurpd" here
http://www.openldap.org/doc/admin24/replication.html
for some of those issues. IME, a replication mechanism that requires the
provider to maintain per-replica state will not scale.
There are a few questions I still have to rehash :
- how many threads should we have ? A pool or one thread per replicas ?
- how do we manage the queue and the store ?
- when we reconnect, how do we tell the server which is the last entry
correctly replicated ?
- and also how do we deal with reconnection if the server consider the
replica is still connected ?
wdyt ?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/