Re: Replication heads up

Pierre-Arnaud Marcelot Mon, 08 Aug 2011 03:18:12 -0700

Thanks for heads up guys!

Regards,
Pierre-Arnaud


On 8 août 2011, at 11:35, Kiran Ayyagari wrote:

> On Mon, Aug 8, 2011 at 2:26 PM, Emmanuel Lecharny <[email protected]> wrote:
>> Hi guys,
>> 
>> so we found the reason why the replication tests are failing randomly. Let
>> me explain :
>> 
>> - the consumer is connected to the provider until it gets disconnected. It
>> can last for days or weeks.
>> - the producer pushes modifications to the consumer directly if the consumer
>> is connected
>> - if the consumer is disconnected, the modifications are stored in a queue,
>> waiting for the client to reconnect to send it the content of this queue
>> 
>> That being said, we have one corner case when the provider 'thinks' that the
>> consumer is connected when it's not anymore : the message is sent to the
>> disconnected client, and we don't push it to the queue, losing it.
>> 
>> One better idea is to push *all* the modifications to the queue, not matter
>> what. Then a thread will process this queue and send it contents to the
>> client, unless the client isn't connected. In any case, we *don't* delete
>> messages from the queue. Never.
>> 
>> That raises a question : what o we do in the long term ? The queue will grow
>> and never shrink. In fact this is quite simple : we truncate the queue after
>> a defined period of time (say once a day, or once a week). Ever modification
>> older than the interval is simply deleted from the queue.
>> 
>> What if a consumer is not able to reconnect within this period of time ?
>> Simple :
>> - the consumer sends the lastEntryCSN it received, and if it's older than
>> what's in the queue, then we do a full replication.
>> 
>> It may seems costly, but it's unlikely that a consumer get disconnected for
>> a long period of time. All in all, it's like if we just added a brand new
>> consumer, with nothing in it.
>> 
>> One option would be to ask the consumer to send a periodic message to the
>> producer informing it that it's up to date. It could be a daily unbind/bind
>> for instance. The unbind will kill the pending persistent search we
>> established between the producer and consumer, to establish a new one. As we
>> will send a new request, with the lastEntryCSN, we will be able to truncate
>> the provider queue, so it won't grow forever.
>> 
> this case is already handled(in my recent commit), i.e., when a
> consumer reconnects we remove all the entries from log that are older
> than the CSN value present in the cookie.
> 
> Coming to restarting the consumer at periodic intervals is an
> interesting idea, this perfectly solves many cases of 'how to
> prune/truncate the log' except in cases of a consumer that never
> reconnects, in which case we need to go for a time based policy
> 
>> We will probably work around this idea with Kiran this week. I'm positive
>> that it can work well by the end of this week, or even earlier.
>> 
>> Stay tuned !
>> 
> thanks for the putting these in ink, Emmanuel
>> --
>> Regards,
>> Cordialement,
>> Emmanuel Lécharny
>> www.iktek.com
>> 
>> 
> 
> 
> 
> -- 
> Kiran Ayyagari

Re: Replication heads up

Reply via email to