Chetan is making things crystal clear for us. Our next steps are:
1) Learn what the MAXIMUM "inconsistency window" could be. Is it possible to delay past 5 seconds? 10 Seconds? 60? What determines this? Only server load? I'll ask on the JCR forum and also experiment. 2) Design and test a solution almost exactly as Bertrand described. Sling responds to POST/PUT/DELETE with a JCR revision. Sling will behave differently when the Request contains a JCR revision more recent than it's current. I have no idea what I'm getting into or how hard this will be. Bertrand, I'd feel selfish taking you up on your offer to build this for me. Yet I'd be a fool to not at least partner with you to get it done. Should we correspond outside this mail list? Perhaps you could point me to the files you would edit to get this done and I could try to do it myself? I imagine a solution where you can configure, through OSGI, whether Sling will do one of the following: A) Ignore JCR revision in Request, and function as it does today (Default setting) B) Block until it has caught up to JCR revision in Request C) Call some other custom handler? This way we can do custom things like send a redirect to enhance the user experience during a block. In a product like ours, 5 or 10 second blocks aren't acceptable without user feedback. I also don't know how to determine the current Sling instance's Revision, or how to compute whether one revision is "more recent" than another. --------- Responding to a couple other minor points: Felix Meschberger-3 wrote > I suggest you go with something else, which does *not* need the repository > for persistence. This means you might want to investigate your own > authentication handler ... Thank you Felix :) I've actually done this work recently and it's working great! We have "stateless" authentication now, but are now dealing with the unacceptable inconsistency that Chetan warned about. That's the question on the table: In a write-operation-heavy application, how do we provide a "read-your-writes" consistent experience on an eventually-consistent solution (Sling cluster), when traditional sticky-sessions are an invalid solution because your userbase is large enough to demand server-scaling several times throughout the day. chetan mehrotra wrote > I can understand issue around when existing Sling server is removed > from the pool. However adding a new instance should not cause existing > users to be reassigned When adding an instance, we purposely invalidate all sticky sessions and users will get re-assigned to a new Sling instance, so that the new server actually improves performance. Imagine a farm of 4 app servers that has been SLAMMED and isn't performing well. Adding 1 or 100 new servers to that farm won't improve performance if every user is "stuck" to the previous 4 servers. If we don't do this invalidation and re-assignment on scaling-up, it can takes hours potentially for a scale-up to positively impact an overloaded cluster. Bertrand Delacretaz wrote > But Lance could patch [1] to experiment with different values, right? > .... > [1] > http://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java Thank you for pointing me to the code Bertrand :) On new information from Chetan, I'm losing interest in changing that value. Perhaps setting aSyncDelay to 0 or some small number will cause it to perform slower but be more consistent... However, my tentative assessment is that the interval would just be "checked" more often, but it will also get skipped more often, due to "local cache invalidation, computing the external changes for observation" as Chetan put it. I would love to be wrong about this and I'll ask on the JCR forum. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069730.html Sent from the Sling - Users mailing list archive at Nabble.com.