Thanks for heads up guys! Regards, Pierre-Arnaud
On 8 août 2011, at 11:35, Kiran Ayyagari wrote: > On Mon, Aug 8, 2011 at 2:26 PM, Emmanuel Lecharny <[email protected]> wrote: >> Hi guys, >> >> so we found the reason why the replication tests are failing randomly. Let >> me explain : >> >> - the consumer is connected to the provider until it gets disconnected. It >> can last for days or weeks. >> - the producer pushes modifications to the consumer directly if the consumer >> is connected >> - if the consumer is disconnected, the modifications are stored in a queue, >> waiting for the client to reconnect to send it the content of this queue >> >> That being said, we have one corner case when the provider 'thinks' that the >> consumer is connected when it's not anymore : the message is sent to the >> disconnected client, and we don't push it to the queue, losing it. >> >> One better idea is to push *all* the modifications to the queue, not matter >> what. Then a thread will process this queue and send it contents to the >> client, unless the client isn't connected. In any case, we *don't* delete >> messages from the queue. Never. >> >> That raises a question : what o we do in the long term ? The queue will grow >> and never shrink. In fact this is quite simple : we truncate the queue after >> a defined period of time (say once a day, or once a week). Ever modification >> older than the interval is simply deleted from the queue. >> >> What if a consumer is not able to reconnect within this period of time ? >> Simple : >> - the consumer sends the lastEntryCSN it received, and if it's older than >> what's in the queue, then we do a full replication. >> >> It may seems costly, but it's unlikely that a consumer get disconnected for >> a long period of time. All in all, it's like if we just added a brand new >> consumer, with nothing in it. >> >> One option would be to ask the consumer to send a periodic message to the >> producer informing it that it's up to date. It could be a daily unbind/bind >> for instance. The unbind will kill the pending persistent search we >> established between the producer and consumer, to establish a new one. As we >> will send a new request, with the lastEntryCSN, we will be able to truncate >> the provider queue, so it won't grow forever. >> > this case is already handled(in my recent commit), i.e., when a > consumer reconnects we remove all the entries from log that are older > than the CSN value present in the cookie. > > Coming to restarting the consumer at periodic intervals is an > interesting idea, this perfectly solves many cases of 'how to > prune/truncate the log' except in cases of a consumer that never > reconnects, in which case we need to go for a time based policy > >> We will probably work around this idea with Kiran this week. I'm positive >> that it can work well by the end of this week, or even earlier. >> >> Stay tuned ! >> > thanks for the putting these in ink, Emmanuel >> -- >> Regards, >> Cordialement, >> Emmanuel Lécharny >> www.iktek.com >> >> > > > > -- > Kiran Ayyagari
