Re: Potentially irrecoverable lost updates

Justin Wed, 17 Dec 2014 06:32:29 -0800

Jens,

I believe I am currently seeing this issue but I am not certain.  In the 
logs  below, the interval is 5 seconds.  Are these the logs you mentioned?


00:42:33.669456 Cache: Received #2 
("_user/bb039d924013d63f9e10ae99d46eeeb1bc59ffc9b2fcb4f6f692bf6af87eefdf")
00:42:33.669688 Cache:   Deferring #2 (1 now waiting for #1...#1)

00:42:39.564527 WARNING: changeCache: Giving up, accepting #2 even though 
#1 is missing -- db.(*changeCache)._addPendingLogs() at change_cache.go:320

Prior to seeing this post I had opened a issue on the android CBL github 
assuming it was an issue with the CBL. 
https://github.com/couchbase/couchbase-lite-android/issues/473

If this my problem.  No sweat I'll compile the latest sync gateway code and 
give it a try with the new 60 second timeout. 

Thanks in advance,

-Justin

P.S. 

Anton awesome thread -Thanks a million.


On Friday, 5 December 2014 01:42:12 UTC+7, Jens Alfke wrote:
>
> Anton, 
>
> Yes, you're describing a worst-case scenario that can sometimes happen 
> under very heavy database-server load. The root of the problem is that the 
> database may not deliver all notifications of document updates (the "TAP" 
> feed) in a timely manner when it's very busy. TAP itself isn't an ordered 
> stream, but Sync Gateway updates do have chronological sequence numbers and 
> need to be processed in order. So if one update goes missing for a long 
> time (it always arrives eventually but it can take minutes) it has to 
> buffer up all numerically-later updates until the missing one arrives. 
>
> To keep the gateway from blocking its update notifications indefinitely, 
> there's a timeout, as you said. After a while it will give up on a missing 
> update and proceed without it. When that update does arrive, it has to 
> ignore it because its sequence number is now out of order so there's no way 
> to re-insert it into change feeds that it's already delivered to clients. 
>
> The solution to this is the new update-notification system in Couchbase 
> Server 3.0, which is called DCP (Database Change Protocol). It's much 
> better about timely delivery. We unfortunately didn't have time to update 
> Sync Gateway to use this new protocol before Couchbase 3 shipped, but the 
> work is underway now and we plan to have it in a new Gateway release soon. 
>
> Until then, one workaround is to provision enough database-server 
> resources that the cluster nodes won't reach those levels of load during 
> actual use. (As I'm sure you've seen, the Gateway logs a warning whenever a 
> sequence gets dropped on the floor, so it's easy to detect the problem.) 
>
> Another workaround is to increase those limits ("buffers more than 10k 
> items, or waits for longer than 5 seconds") in the Gateway source code and 
> rebuild it. The downside is that instead of change notifications being 
> lost, you'll instead get greater latency in delivering them, but that may 
> be acceptable depending on your use case. These constants are in the file 
> change_cache.go: 
>
> var MaxChannelLogPendingCount = 10000              // Max number of 
> waiting sequences 
> var MaxChannelLogPendingWaitTime = 5 * time.Second // Max time we'll wait 
> for a missing sequence 
>
> (In hindsight, the timeout should probably have been more like 60 
> seconds.) 
>
> —Jens

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase Mobile" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/mobile-couchbase/d0d7bae5-3992-4c96-850c-700fa8cb6525%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Potentially irrecoverable lost updates

Reply via email to