Hi guys,
so we found the reason why the replication tests are failing randomly.
Let me explain :
- the consumer is connected to the provider until it gets disconnected.
It can last for days or weeks.
- the producer pushes modifications to the consumer directly if the
consumer is connected
- if the consumer is disconnected, the modifications are stored in a
queue, waiting for the client to reconnect to send it the content of
this queue
That being said, we have one corner case when the provider 'thinks' that
the consumer is connected when it's not anymore : the message is sent to
the disconnected client, and we don't push it to the queue, losing it.
One better idea is to push *all* the modifications to the queue, not
matter what. Then a thread will process this queue and send it contents
to the client, unless the client isn't connected. In any case, we
*don't* delete messages from the queue. Never.
That raises a question : what o we do in the long term ? The queue will
grow and never shrink. In fact this is quite simple : we truncate the
queue after a defined period of time (say once a day, or once a week).
Ever modification older than the interval is simply deleted from the queue.
What if a consumer is not able to reconnect within this period of time ?
Simple :
- the consumer sends the lastEntryCSN it received, and if it's older
than what's in the queue, then we do a full replication.
It may seems costly, but it's unlikely that a consumer get disconnected
for a long period of time. All in all, it's like if we just added a
brand new consumer, with nothing in it.
One option would be to ask the consumer to send a periodic message to
the producer informing it that it's up to date. It could be a daily
unbind/bind for instance. The unbind will kill the pending persistent
search we established between the producer and consumer, to establish a
new one. As we will send a new request, with the lastEntryCSN, we will
be able to truncate the provider queue, so it won't grow forever.
We will probably work around this idea with Kiran this week. I'm
positive that it can work well by the end of this week, or even earlier.
Stay tuned !
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com