> Each time we remove an
> instance, those users will go to a new Sling instance, and experience the
> inconsistency. Each time we add an instance, we will invalidate all
> stickiness and users will get re-assigned to a new Sling instance, and
> experience the inconsistency.

I can understand issue around when existing Sling server is removed
from the pool. However adding a new instance should not cause existing
users to be reassigned

Now to your queries
---------------------------

> 1) When a brand new Sling instance discovers an existing JCR (Mongo), does it 
> automatically and immediately go to the latest head revision?

It sees the latest head revision

>  Increasing load increases the number of seconds before a "sync," however 
> it's always near-exactly a second interval.

Yes there is a "asyncDelay" setting in DocumentNodeStore which
defaults to 1 sec. Currently its not possible to modify it via OSGi
config though.

>- What event is causing it to "miss the window" and wait until the next 1 
>second synch interval?

this periodic read also involves some other work. Like local cache
invalidation, computing the external changes for observation etc which
cause this time to increase. More the changes done more would be the
time spent on that kind of work

Stickyness and Eventual Consistency
-------------------------------------------------

There are multiple level of eventual consistency [1]. If we go for
sticky session then we are trying for "Session Consistency". However
what we require in most cases is read-your-write consistency.

We can discuss ways to do that efficiently with current Oak
architecture. Something like this is best discuss on oak-dev though.
One possible approach can be to use a temporary issued sticky cookie.
Under this model

1. Sling cluster maintains a cluster wide service which records the
current head revision of each cluster node and computes the minimum
revision of them.

2. A Sling client (web browser) is free to connect to any server
untill it performs a state change operation like POST or PUT

3. If it performs a state change operation then the server which
performs that operation issues a cookie which is set to be sticky i.e.
Load balancer is configured to treat that as cookie used to determine
stickiness. So from now on all request from this browser would go to
same server. This cookie lets say record the current head revision

4. In addition the Sling server would constantly get notified of
minimum revision which is visible cluster wide. Once that revision
becomes older than revision in #3 it removes the cookie on next
response sent to that browser

This state can be used to determine if server is safe to be taken out
of the cluster or not.

This is just a rough thought experiment which may or may not work and
would require broader discussion!


Chetan Mehrotra
[1] http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

Reply via email to