Hi guys,
the current implementation of replication, AFAICT, is based on a pull
system. So to speak, each time you modify some entry, a message is sent
to all the replicas, waiting for an ack to be returned.
The main advantage is that we can't be faster: we replicate as soon as
possible.
The main issue is that if a replica is not connected, we will try and
try and try, until the remote server is back.
Here are some ideas I rehashed in the train on my way to office those
last two days...
- We should ask the replicas to register to the other servers using a
LDAP extended request
- then the server will push the modifications in a blocking queue for
each replicas
- the blocking queue is read by a thread and the modifications are
stored in a base, and sent to the replicas using a LDAP request, with a
control (Replication)
- the replica receives the modifications as simple LDAP requests, plus a
control, and deal with those, and send back a Ldap response with a
status, allowing the modification to be removed from the store.
- if the replica is disconnected (for any reason), the server does not
send anymore modifications to the replica, until the replica connects again.
- in this case, we simply restart the thread and send all the pending
modifications found in the store and in the queue to the replica.
There are a few questions I still have to rehash :
- how many threads should we have ? A pool or one thread per replicas ?
- how do we manage the queue and the store ?
- when we reconnect, how do we tell the server which is the last entry
correctly replicated ?
- and also how do we deal with reconnection if the server consider the
replica is still connected ?
wdyt ?
--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org