Re: Not-sticky sessions with Sling?
Hi Lance, On Wed, Jan 18, 2017 at 11:21 PM, lancedolanwrote: > ...Bertrand, I'd feel selfish taking you up on your offer to build this for > me. > Yet I'd be a fool to not at least partner with you to get it done. Should we > correspond outside this mail list?... I understand you're probably looking at a different solution now but just wanted to clarify this: the Sling dev list would be the place to discuss such things, no need for off-list communications. -Bertrand
RE: Not-sticky sessions with Sling?
Jason Bailey wrote > Couldn't this be simplified to simply stating that the sticky session > cookie only lasts for x amount of seconds? WHOAAA!! Bertrand, probably hold the phone on everything else I suggested in my last post - this solution is insanely simple, embarrassingly obvious in hindsight, and us architects on our side can see no problem with this solution. We actually had no idea that there is a expiration by seconds setting in AWS elastic load balancer. We just checked the interface and found the setting. Obviously in the good old days of F5 we could do whatever we want, but we're married to AWS now and had no idea we could do this. Thank you Jason, you might have just saved me some unsavory development task, whilst helping me Keep It Simple, Stupid. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069731.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
Chetan is making things crystal clear for us. Our next steps are: 1) Learn what the MAXIMUM "inconsistency window" could be. Is it possible to delay past 5 seconds? 10 Seconds? 60? What determines this? Only server load? I'll ask on the JCR forum and also experiment. 2) Design and test a solution almost exactly as Bertrand described. Sling responds to POST/PUT/DELETE with a JCR revision. Sling will behave differently when the Request contains a JCR revision more recent than it's current. I have no idea what I'm getting into or how hard this will be. Bertrand, I'd feel selfish taking you up on your offer to build this for me. Yet I'd be a fool to not at least partner with you to get it done. Should we correspond outside this mail list? Perhaps you could point me to the files you would edit to get this done and I could try to do it myself? I imagine a solution where you can configure, through OSGI, whether Sling will do one of the following: A) Ignore JCR revision in Request, and function as it does today (Default setting) B) Block until it has caught up to JCR revision in Request C) Call some other custom handler? This way we can do custom things like send a redirect to enhance the user experience during a block. In a product like ours, 5 or 10 second blocks aren't acceptable without user feedback. I also don't know how to determine the current Sling instance's Revision, or how to compute whether one revision is "more recent" than another. - Responding to a couple other minor points: Felix Meschberger-3 wrote > I suggest you go with something else, which does *not* need the repository > for persistence. This means you might want to investigate your own > authentication handler ... Thank you Felix :) I've actually done this work recently and it's working great! We have "stateless" authentication now, but are now dealing with the unacceptable inconsistency that Chetan warned about. That's the question on the table: In a write-operation-heavy application, how do we provide a "read-your-writes" consistent experience on an eventually-consistent solution (Sling cluster), when traditional sticky-sessions are an invalid solution because your userbase is large enough to demand server-scaling several times throughout the day. chetan mehrotra wrote > I can understand issue around when existing Sling server is removed > from the pool. However adding a new instance should not cause existing > users to be reassigned When adding an instance, we purposely invalidate all sticky sessions and users will get re-assigned to a new Sling instance, so that the new server actually improves performance. Imagine a farm of 4 app servers that has been SLAMMED and isn't performing well. Adding 1 or 100 new servers to that farm won't improve performance if every user is "stuck" to the previous 4 servers. If we don't do this invalidation and re-assignment on scaling-up, it can takes hours potentially for a scale-up to positively impact an overloaded cluster. Bertrand Delacretaz wrote > But Lance could patch [1] to experiment with different values, right? > > [1] > http://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java Thank you for pointing me to the code Bertrand :) On new information from Chetan, I'm losing interest in changing that value. Perhaps setting aSyncDelay to 0 or some small number will cause it to perform slower but be more consistent... However, my tentative assessment is that the interval would just be "checked" more often, but it will also get skipped more often, due to "local cache invalidation, computing the external changes for observation" as Chetan put it. I would love to be wrong about this and I'll ask on the JCR forum. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069730.html Sent from the Sling - Users mailing list archive at Nabble.com.
RE: Not-sticky sessions with Sling?
Couldn't this be simplified to simply stating that the sticky session cookie only lasts for x amount of seconds? I like this idea, but I'm not sure this is really a sling solution rather than an API management or proxy solution. When you take an instance out of the pool, you would need to state that it's not available for new requests, but still honor it for x amount of time for those with the sticky session cookie that says they should go there. -Jason -Original Message- From: Chetan Mehrotra [mailto:chetan.mehro...@gmail.com] Sent: Wednesday, January 18, 2017 6:49 AM To: users@sling.apache.org Subject: Re: Not-sticky sessions with Sling? > Each time we remove an > instance, those users will go to a new Sling instance, and experience > the inconsistency. Each time we add an instance, we will invalidate > all stickiness and users will get re-assigned to a new Sling instance, > and experience the inconsistency. I can understand issue around when existing Sling server is removed from the pool. However adding a new instance should not cause existing users to be reassigned Now to your queries --- > 1) When a brand new Sling instance discovers an existing JCR (Mongo), does it > automatically and immediately go to the latest head revision? It sees the latest head revision > Increasing load increases the number of seconds before a "sync," however > it's always near-exactly a second interval. Yes there is a "asyncDelay" setting in DocumentNodeStore which defaults to 1 sec. Currently its not possible to modify it via OSGi config though. >- What event is causing it to "miss the window" and wait until the next 1 >second synch interval? this periodic read also involves some other work. Like local cache invalidation, computing the external changes for observation etc which cause this time to increase. More the changes done more would be the time spent on that kind of work Stickyness and Eventual Consistency - There are multiple level of eventual consistency [1]. If we go for sticky session then we are trying for "Session Consistency". However what we require in most cases is read-your-write consistency. We can discuss ways to do that efficiently with current Oak architecture. Something like this is best discuss on oak-dev though. One possible approach can be to use a temporary issued sticky cookie. Under this model 1. Sling cluster maintains a cluster wide service which records the current head revision of each cluster node and computes the minimum revision of them. 2. A Sling client (web browser) is free to connect to any server untill it performs a state change operation like POST or PUT 3. If it performs a state change operation then the server which performs that operation issues a cookie which is set to be sticky i.e. Load balancer is configured to treat that as cookie used to determine stickiness. So from now on all request from this browser would go to same server. This cookie lets say record the current head revision 4. In addition the Sling server would constantly get notified of minimum revision which is visible cluster wide. Once that revision becomes older than revision in #3 it removes the cookie on next response sent to that browser This state can be used to determine if server is safe to be taken out of the cluster or not. This is just a rough thought experiment which may or may not work and would require broader discussion! Chetan Mehrotra [1] http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Re: Not-sticky sessions with Sling?
On Wed, Jan 18, 2017 at 12:48 PM, Chetan Mehrotrawrote: > ...there is a "asyncDelay" setting in DocumentNodeStore which > defaults to 1 sec. Currently its not possible to modify it via OSGi > config though But Lance could patch [1] to experiment with different values, right? And then replace the oak-core bundle in Sling, starting with the right version for patching, the one his Sling instance currently uses. -Bertrand [1] http://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java
Re: Not-sticky sessions with Sling?
> Each time we remove an > instance, those users will go to a new Sling instance, and experience the > inconsistency. Each time we add an instance, we will invalidate all > stickiness and users will get re-assigned to a new Sling instance, and > experience the inconsistency. I can understand issue around when existing Sling server is removed from the pool. However adding a new instance should not cause existing users to be reassigned Now to your queries --- > 1) When a brand new Sling instance discovers an existing JCR (Mongo), does it > automatically and immediately go to the latest head revision? It sees the latest head revision > Increasing load increases the number of seconds before a "sync," however > it's always near-exactly a second interval. Yes there is a "asyncDelay" setting in DocumentNodeStore which defaults to 1 sec. Currently its not possible to modify it via OSGi config though. >- What event is causing it to "miss the window" and wait until the next 1 >second synch interval? this periodic read also involves some other work. Like local cache invalidation, computing the external changes for observation etc which cause this time to increase. More the changes done more would be the time spent on that kind of work Stickyness and Eventual Consistency - There are multiple level of eventual consistency [1]. If we go for sticky session then we are trying for "Session Consistency". However what we require in most cases is read-your-write consistency. We can discuss ways to do that efficiently with current Oak architecture. Something like this is best discuss on oak-dev though. One possible approach can be to use a temporary issued sticky cookie. Under this model 1. Sling cluster maintains a cluster wide service which records the current head revision of each cluster node and computes the minimum revision of them. 2. A Sling client (web browser) is free to connect to any server untill it performs a state change operation like POST or PUT 3. If it performs a state change operation then the server which performs that operation issues a cookie which is set to be sticky i.e. Load balancer is configured to treat that as cookie used to determine stickiness. So from now on all request from this browser would go to same server. This cookie lets say record the current head revision 4. In addition the Sling server would constantly get notified of minimum revision which is visible cluster wide. Once that revision becomes older than revision in #3 it removes the cookie on next response sent to that browser This state can be used to determine if server is safe to be taken out of the cluster or not. This is just a rough thought experiment which may or may not work and would require broader discussion! Chetan Mehrotra [1] http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Re: Not-sticky sessions with Sling?
Hi Lance, On Wed, Jan 18, 2017 at 2:43 AM, lancedolanwrote: > ...It pretty much always takes 1 second exactly for a Sling instance to get > the > latest revision, and thus the latest data. When not 1 second, it takes 2 > seconds exactly I don't know enough about Oak internals to give your a precise answer here but this 1 second increment vaguely rings a bell, based on discussions with Chetan when working on our adaptTo demo [1]. Chetan is one of the few Sling committers who's deep into Oak as well, hopefully he can comment on this but otherwise best would be to ask on the Oak dev list about that specific issue, as I think this delay is entirely Oak dependent. Apart from that, handling such things at the client level could be valid - as you say if you had a way to send the current revision number to the client (in an opaque way probably) it could add a header to its next request saying that it wants to see that revision, and Sling/Oak could block that request until that revision is available. I suppose a one or two second delay that happens only rarely is acceptable if it makes your system easier to scale, and hopefully that 1-second cycle can be configured to be shorter. I'm willing to help make this functionality available if you don't find a better way, as I think it can be generally useful. -Bertrand [1] https://github.com/bdelacretaz/sling-adaptto-2016
Re: Not-sticky sessions with Sling?
Hi Lance Ok, so being as it is — eventual consistent repo replicating the Oak login token and not able to use sticky sessions, I suggest you go with something else, which does *not* need the repository for persistence. This means you might want to investigate your own authentication handler or look at other options here at Sling — for example the old Form based login (not sure what its state is, though). Or good ol’ HTTP Basic (at some other prices like no support for „logout“) Regards Felix > Am 18.01.2017 um 02:43 schrieb lancedolan: > > lancedolan wrote >> I must know what determines the duration of this revision catch-up time >> ... > > While I don't know where to look in src code to answer this, I did run a > very revealing experiment. > > It pretty much always takes 1 second exactly for a Sling instance to get the > latest revision, and thus the latest data. When not 1 second, it takes 2 > seconds exactly. If you increase load on the server, the likelihood of > taking 2 seconds increases, and you also begin to see it take exactly 3 > seconds in some rare cases. Increasing load increases the number of seconds > before a "sync," however it's always near-exactly a second interval. > > It seems impossible for this to be a natural coincidence - I smell a setting > somewhere (or perhaps hardcode value) which is telling Sling to check the > latest JCR revision on 1 second intervals. When that window can't be hit, it > checks on the next second interval, and so on. > > Is there a Sling dev who can tell me whether this is configurable? I have a > load of questions about this discovery: > > - Am I wrong? (I'll be shocked) > - Perhaps we can speed it up? > - What event is causing it to "miss the window" and wait until the next 1 > second synch interval? > - If we do decrease the interval, will that just increase the likelihood of > taking more intervals anyhow? > - Is there a maximum number of 1 second intervals before the things just > gets the latest?? > > progress. > > > > -- > View this message in context: > http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069711.html > Sent from the Sling - Users mailing list archive at Nabble.com.
RE: Not-sticky sessions with Sling?
Thi is tempting, but I know in my dev-instinct that we won't have the time to solve all the unsolved in that effort. Thank you for suggesting though :) -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069712.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
lancedolan wrote > I must know what determines the duration of this revision catch-up time > ... While I don't know where to look in src code to answer this, I did run a very revealing experiment. It pretty much always takes 1 second exactly for a Sling instance to get the latest revision, and thus the latest data. When not 1 second, it takes 2 seconds exactly. If you increase load on the server, the likelihood of taking 2 seconds increases, and you also begin to see it take exactly 3 seconds in some rare cases. Increasing load increases the number of seconds before a "sync," however it's always near-exactly a second interval. It seems impossible for this to be a natural coincidence - I smell a setting somewhere (or perhaps hardcode value) which is telling Sling to check the latest JCR revision on 1 second intervals. When that window can't be hit, it checks on the next second interval, and so on. Is there a Sling dev who can tell me whether this is configurable? I have a load of questions about this discovery: - Am I wrong? (I'll be shocked) - Perhaps we can speed it up? - What event is causing it to "miss the window" and wait until the next 1 second synch interval? - If we do decrease the interval, will that just increase the likelihood of taking more intervals anyhow? - Is there a maximum number of 1 second intervals before the things just gets the latest?? progress. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069711.html Sent from the Sling - Users mailing list archive at Nabble.com.
RE: Not-sticky sessions with Sling?
not sure if this is of any help for your usecase - but do you need the full JCR features and complexity underneath sling, or only a sling cluster + storage in mongodb? if you need only basic resource read and write features via the Sling API you might bypass JCR completely and directly use a NoSQL resource provider for MongoDB, see [1] and [2]. but please be aware that: 1. the code might not be production-ready for heavy usages yet (not sure how much it is used) 2. it does not add any support for cluster synchronization etc. if your multiple nodes write to the same path you have to take care of concurrency yourself 3. the code is not yet migrated to the latest resourceprovider SPI from sling 9-SNAPSHOT, but should still run with it 4. it has not built-in support for ACLs etc., you have to take care of this yourself this resource provider is only a thin layer above the MongoDB java client, so it should be possible to have full control what mongodb features are used in which way. stefan [1] http://sling.apache.org/documentation/bundles/nosql-resource-providers.html [2] https://github.com/apache/sling/tree/trunk/contrib/nosql
Re: Not-sticky sessions with Sling?
Bertrand Delacretaz wrote > That would be a pity, as I suppose you're starting to like Sling now ;-) Ma you have no idea haha! I've got almost every dev in the office all excited about this now haha. However, it seems our hands are tied. I wrote local consistency test scripts which POST and immediately GET a property, checking for consistency. Results on a 2-member Sling cluster and localhost mongodb: -0% consistency with 50ms delay between POST and GET -35% to 50% consistency with 1 second delay between POST and GET -90% consistency with 2 second delay -98% to 100% consistency after 3 seconds delay. So yes, you are all correct. True, we could use sticky sessions to avoid inconsistency... but only until we scale our server-farm up or down, which we do daily So sticky sessions doesn't really solve anything for us. If you already understand how scaling nullifies the benefit of sticky sessions, you can skip past this paragraph and move onto the next: Each time we scale, users will lose their "stickiness." We have thousands of write users ("authors"). Hundreds concurrently. Compare that to typical AEM projects have less than 10 authors, and rarely more than 1 concurrently (I've got several global-scale AEM implementations under my belt). For us, it's a requirement that we add or remove app servers multiple times per day, optimizing between AWS costs and performance. Each time we remove an instance, those users will go to a new Sling instance, and experience the inconsistency. Each time we add an instance, we will invalidate all stickiness and users will get re-assigned to a new Sling instance, and experience the inconsistency. If we don't do this invalidation and re-assignment on scaling-up, it can takes hours potentially for a scale-up to positively impact an overloaded cluster where all users are permanently stuck to their current app server instance. As you can see, we need to deal with the inconsistency problem, regardless of whether we use sticky sessions. I have some ideas, but none are appealing, and would benefit greatly from your guys' knowledge: 1) Race condition If this delay to "catch up" to latest revision is mostly predictable, it doesn't grow as the repo grows in size, or if it doesn't change due to other variables, we can measure it and then account for it reliably with user-feedback (loading screen or whatever). This *might* be a race condition we can live with. My results above show as much as 3 or 4 seconds to "catch up." I must know what determines the duration of this revision catch-up time. Is it a function of repo size? Does the delay grow as the repo size grows? Does the delay grow as usage increases? Does the delay grow as the number of Sling instances in the cluster grow? Does the delay grow as network latency grows (I'm testing all on the same machine with practically no latency compared to a distributed production deployment). Is there any Sling dev, who is familiar with the algorithm that Sling uses to select a "newer" revision, who could answer this for me? ... perhaps it's just polling on a predictable time period! :) 2) Browser knows what revision it's on. The browser could know what JCR Revision it's on, learning that revision after every POST or PUT, perhaps in some response header. When its future requests are sent to a Sling instance on an older revision, it could wait until that instance "catches up." This sounds like a horrible example of client code operating on knowledge of underlying implementation details, and we're not at all excited about the chaos to implement it. That being said, can we programmatically check the revision that the current Sling instance is reading from? 3) "Pause" during scale-up or scale-down. Each time we add or remove a sling instance, all users experience a "pause" screen while their new Sling Instance "catches up." This is essentially the same as the race condition in #1, except we'd constrain users to only experience this when we scale up or down. However, we are *extremely* unhappy to impact our users just because we're scaling up or down, especially when we must do so frequently. Anybody have any other ideas? Other questions: 1) When a brand new Sling instance discovers an existing JCR (Mongo), does it automatically and immediately go to the latest head revision? Or is there some progression through the revisions, and it takes time for the Sling instance to catch up to the latest? 2) Is there any reason, BESIDES JCR CONSISTENCY, why a Sling cluster must be deployed with sticky-sessions? What other problems would we introduce by not having sticky sessions? I seem to have used this email to track my own thoughts more than anything, my sincere thanks if you've taken the time to read the whole thing. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069709.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
My bad: CAP = consistency, availability and partition-tolerance. Jörg 2017-01-17 19:35 GMT+01:00 Jörg Hoh: > HI Lance, > > 2017-01-17 19:19 GMT+01:00 lancedolan : > >> ... >> >> If "being eventual" is the reason we can't go stateless, then how is adobe >> getting away with it if we know their architecture is also eventual?? What >> am I missing? I understand that the documentation I linked is a >> distributed >> segment store architecture and mine is a share documentstore datastore, >> but >> what is the REASON for them allowing a stateless (not sticky) >> architecture, >> if the REASON is not eventual consistency ? Both architectures are >> eventual. >> >> > It depends a lot on your usecase. For example Facebook is also eventually > consistent (I sometimes think that the timeline is different on every > reload). Also the CAP theorem says, that you can choose only 2 of > "consistency, atomicity and partition-tolerance". > > In the case of independent segment stores (in Adobe speak: publish > instances, stateless loadbalancing) you have a lot of individual requests > from multiple users. So you as an individual cannot decide if another gets > the very same content as you. And as long as this eventual consistency is > not causing annoyances and friction on and end-user side (e.g. you hit a > intra-side link, which returns in a 404), I would not consider it as a > problem. And these problems occur so rarely, that many (including me and > many other users of AEM) ignore it for daily work. But this is only valid > for a readonly usecase! > > The situation is different on the clustered documentNodeStore (in Adobe > speak: authoring, sticky connections). Due to write skew write operations > will be visible with a small delay on all cluster nodes. But because there > it matters that a user sees the changes he just did. And to overcome this > limitation with the write skew, the recommendation is to use > sticky-sessions. > > > > Jörg > > > -- > Cheers, > Jörg Hoh, > > http://cqdump.wordpress.com > Twitter: @joerghoh > -- Cheers, Jörg Hoh, http://cqdump.wordpress.com Twitter: @joerghoh
Re: Not-sticky sessions with Sling?
HI Lance, 2017-01-17 19:19 GMT+01:00 lancedolan: > ... > > If "being eventual" is the reason we can't go stateless, then how is adobe > getting away with it if we know their architecture is also eventual?? What > am I missing? I understand that the documentation I linked is a distributed > segment store architecture and mine is a share documentstore datastore, but > what is the REASON for them allowing a stateless (not sticky) architecture, > if the REASON is not eventual consistency ? Both architectures are > eventual. > > It depends a lot on your usecase. For example Facebook is also eventually consistent (I sometimes think that the timeline is different on every reload). Also the CAP theorem says, that you can choose only 2 of "consistency, atomicity and partition-tolerance". In the case of independent segment stores (in Adobe speak: publish instances, stateless loadbalancing) you have a lot of individual requests from multiple users. So you as an individual cannot decide if another gets the very same content as you. And as long as this eventual consistency is not causing annoyances and friction on and end-user side (e.g. you hit a intra-side link, which returns in a 404), I would not consider it as a problem. And these problems occur so rarely, that many (including me and many other users of AEM) ignore it for daily work. But this is only valid for a readonly usecase! The situation is different on the clustered documentNodeStore (in Adobe speak: authoring, sticky connections). Due to write skew write operations will be visible with a small delay on all cluster nodes. But because there it matters that a user sees the changes he just did. And to overcome this limitation with the write skew, the recommendation is to use sticky-sessions. Jörg -- Cheers, Jörg Hoh, http://cqdump.wordpress.com Twitter: @joerghoh
Re: Not-sticky sessions with Sling?
Ok First of all - I GENUINELY appreciate the heck out of your time, and patience!! ... and THIS is really interesting: If THIS is true: chetan mehrotra wrote > If you are running a cluster with Sling on Oak/Mongo then sticky > sessions would be required due to eventual consistent nature of > repository. and THIS is true: chetan mehrotra wrote > Cluster which involves multiple datastores (tar) > is also eventually consistent. Then why is adobe recommending it's multi-million-dollar projects to go stateless with the encapsulated token here, if those architectures are *also* eventually: https://docs.adobe.com/docs/en/aem/6-1/administer/security/encapsulated-token.html If "being eventual" is the reason we can't go stateless, then how is adobe getting away with it if we know their architecture is also eventual?? What am I missing? I understand that the documentation I linked is a distributed segment store architecture and mine is a share documentstore datastore, but what is the REASON for them allowing a stateless (not sticky) architecture, if the REASON is not eventual consistency ? Both architectures are eventual. Again, thanks for your patience and sticking with me on this one... whoa pun! -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069698.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
Hi, On Mon, Jan 16, 2017 at 9:16 PM, lancedolanwrote: > ...this probably shoots down our entire Sling > proof of concept project... That would be a pity, as I suppose you're starting to like Sling now ;-) > ...Is there any way > to force all reads to read the most recent revision, perhaps through some > configuration?... As Chetan say that's a question for the Oak dev list, but from a Sling point of view having that option would be useful IMO. If the clustered Sling instances can get consensus on what the most recent revision is (*), having the option for Oak to block until it sees that revision sounds useful in some cases. That should probably happen either on opening a JCR Session or when Session.refresh() is called. -Bertrand (*) which might require an additional consensus mechanism, maybe via Mongo if that's what you're using?
Re: Not-sticky sessions with Sling?
On Tue, Jan 17, 2017 at 1:46 AM, lancedolanwrote: > It's ironic that the cluster which involves multiple datastores (tar), and > thus should have a harder time being consistent, is the one that can > accomplish consistency.. Thats not how it is. Cluster which involves multiple datastores (tar) is also eventually consistent. Changes are either "pushed" to each tar instance via some replication or changes done on one of the cluster node surfaces on other via reverse replication. In either case change done is not immediately visible on other cluster nodes > More importantly, is it a function of Repo size, or repo activity? > If the repo grows in size (number of nodes) and grows in use (number of > writes/sec) does this impact how frequently Sling Cluster instances grab the > most recent revision? Its somewhat related to number of writes and is not dependent on repo size > Less importantly... Myself and colleagues are really curious as to why > jackrabbit is implemented this way. Is there a performance benefit to being > eventually, when the shared datastore is actually consistent? What's the > reasoning for not always hitting the latest data? Also... Is there any way > to force all reads to read the most recent revision, perhaps through some > configuration? Thats a question best suited for discussion on oak-dev mailing list (oak-...@jackrabbit.apache.org) Chetan Mehrotra
Re: Not-sticky sessions with Sling?
This is really disappointing for us. Through this revisioning, Oak has turned a datastore that is consistent by default into a datastore that is not :p It's ironic that the cluster which involves multiple datastores (tar), and thus should have a harder time being consistent, is the one that can accomplish consistency... and the cluster that involves a single shared source of truth (mongo/rdbms), and should have the easiest time being consistent, is not. Hehe. Ahh this probably shoots down our entire Sling proof of concept project. Our next step is to measure the consequences of moving forward with Sling+Oak+Mongo and not-sticky sessions. I'm going to try to test this, and get an empirical answer, by deploying to some AWS instances. I'll develop a custom AuthenticationHandler so that authentication is stateless and then we'll try to see how bad the "delay" might be. However, I would love a theoretical answer as well, if you've got one :) chetan mehrotra wrote > sticky > ... sticky sessions would be required due to eventual consistent nature of > repository. Okay, but if we disable stick sessions ANYHOW (because in our environment we must), how much time delay are we talking, do you think, in realistic practice? We might be able to solve this by giving user-feedback that covers up for the sync delay. When a user clicks save, they might just go to a different screen, providing enough time for things to sync up. It might be a race condition, but that might be acceptable if we can choose that architecture on good information. I think that, in theory, the answer to "worst case scenario" for eventual consistency is always "forever," but really... How long could a Sling instance take to get to the latest revision? More importantly, is it a function of Repo size, or repo activity? If the repo grows in size (number of nodes) and grows in use (number of writes/sec) does this impact how frequently Sling Cluster instances grab the most recent revision? Less importantly... Myself and colleagues are really curious as to why jackrabbit is implemented this way. Is there a performance benefit to being eventually, when the shared datastore is actually consistent? What's the reasoning for not always hitting the latest data? Also... Is there any way to force all reads to read the most recent revision, perhaps through some configuration? A performance cost for this might be tolerable -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069661.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
On Sat, Jan 14, 2017 at 2:08 AM, lancedolanwrote: > To be honest, however, I don't understand fully > what you said in your last post and I also know that AEM 6.1 can do what I'd > like, which is really just Sling+Oak. If they can do it, I don't understand > why we can't. > > ref: > https://docs.adobe.com/docs/en/aem/6-1/administer/security/encapsulated-token.html That links talks about scaling of publish instance which are in most cases based on Segment/Tar setup and hence not forming a "homegenous" cluster. Each cluster node has separate segment store and only potentially shares the DataStore > B) There are separate versions of that property stored in Mongo (perhaps > this is what you meant by the word revision) and it's possible for a > sling-instance to be reading an old version of a property from Mongo. Thats bit closer to whats happening. [1] talks about the data model being used for persistence in Mongo/RDB. For example if there is a property 'prop' on root node i.e. /@prop then its stored in somewhat following form in Mongo { "_id" : "0:/", "prop" : { "r13fcda91720-0-1" : "\"foo\"", "r13fcda919eb-0-1" : "\"bar\"", } } The value for this property is function of revision at which read operation is performed. So 'prop' value is 'foo' at rev r1 and 'bar' at rev r2. These revisions are based on timestamp. Now each cluster node also has a "head" revision. So any read call on that cluster node would only see those values whose revision are <= '"head" revision. This head revision is updated periodically via background read. Due to this snapshot isolation model you see the write skew [2] Chetan Mehrotra [1] https://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html [2] https://jackrabbit.apache.org/oak/docs/architecture/transactional-model.html
Re: Not-sticky sessions with Sling?
Alright, this is a deal breaker for our business (if sling absolutely requires sticky sessions). I hope you're not offended that I'm not 100% convinced yet. I understand you do development on the sling project and are well qualified on the topic. To be honest, however, I don't understand fully what you said in your last post and I also know that AEM 6.1 can do what I'd like, which is really just Sling+Oak. If they can do it, I don't understand why we can't. ref: https://docs.adobe.com/docs/en/aem/6-1/administer/security/encapsulated-token.html I'd hate to throw away all the awesome progress we've made with Sling so far when I know that AEM, which is just sling + jackrabbit, can accomplish app-server-agnostic authentication, and thus avoid sticky sessions. Although I don't understand this "head revision" that you've described, and that's inexperience on my part, I am confident that you're telling me that when there is only one Mongo instance in existence, and all Sling instances get data from it, that directly after "sling-instance-1" writes "myProperty=myValue" to the JCR, then "sling-instances-2" could get the value of "myProperty" from somewhere else - some old value. This only seems possible to me if one of the following is true: A) the Sling instances are caching values from Mongo (perhaps Sling or Oak is doing that?) B) There are separate versions of that property stored in Mongo (perhaps this is what you meant by the word revision) and it's possible for a sling-instance to be reading an old version of a property from Mongo. C) Mongo isn't consistent. We know from mongo documentation that C isn't true - Mongo is consistent when reading from the primary replica set. So it must be that A or B is going on? And if so, what is your guess about how AEM 6, which is Sling+Oak, avoids this pitfall when they very clearly support the stateless architecture (ie not-sticky) that I'm planning? -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069605.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
On Fri, Jan 13, 2017 at 12:20 AM, lancedolanwrote: > In an architecture with > only one Mongo instance, the moment one instance writes to the JCR, another > instance will read the same data and agree consistently. It seems to me that > the JCR state is strongly consistent. No. DocumentNodeStore in each Sling node which are part of cluster would periodically poll the backend root node state revision. If there is any change detected it would update its head revision to match with last seen root node revision from Mongo and then it would generate an external observation event. So any change done on cluster node N1 would be _visible sometime later_ on cluster node N2. So if you create a node on N1 and immediately try to read it on N2 then that read would fail as that change might not be "visible" on other cluster node. So any new session opened on N2 would have its base revision set to current head revision of that cluster node and which may be older than current head revision in Mongo. However the writes would still be consistent. So if you modify same property concurrently from different cluster nodes that one of the write would succeed and other would fail with a conflict. Some details are provided at [1] Chetan Mehrotra [1] https://jackrabbit.apache.org/oak/docs/architecture/transactional-model.html
Re: Not-sticky sessions with Sling?
Chetan, I'd like to confirm to what degree that is true for our proposed architecture. It seems that only the OSGI configurations and bundles would be "eventually consistent." It seems the only "state" that is stored in Sling instances are OSGI configurations and OSGI bundles. Everything else is in the JCR, which Mongo can provide as strongly consistent ( I believe ). Consider this example and correct me where I'm wrong. I'd hate to shoot myself in the foot with bad assumptions. Imagine 3 Sling instances all talking to 1 Mongo instance. In this case, it seems to me that all REPO state is captured in a single Mongo instance, which is consistent by default and eventually consistency only happens if you hit secondary members of a Mongo Replica Set. In an architecture with only one Mongo instance, the moment one instance writes to the JCR, another instance will read the same data and agree consistently. It seems to me that the JCR state is strongly consistent. However, OSGI configurations seem to propagate to each other through the JCR only eventually... Additionally, when we deploy a new OSGI bundle to the JCR (in an install directory or whatever), then those seem to only eventually propagate to all Sling instances. I'm not totally sure that these are "eventually," but it seems like the only place that state will only be "eventual" in this architecture. So, as long as we're cool with OSGI configurations and bundle installations being eventual, everything else, stored in the JCR, should be strongly consistent right? And then, I believe we can even scale the Mongo instances into a replica set for better availability and we'll still be strongly consistent so long as all Sling instances only read from the primary member of the replica set: [1]. Thanks for your time and thoughts dude! [1] https://www.mongodb.com/faq#consistency -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069551.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
If you are running a cluster with Sling on Oak/Mongo then sticky sessions would be required due to eventual consistent nature of repository. Changes done on one cluster node would not be immediately visible on other cluster node. Hence to provide a consistent user experience sticky sessions would be required Chetan Mehrotra On Thu, Jan 12, 2017 at 7:34 AM, lancedolanwrote: > The only example code I can find to authenticate to Sling will use the JEE > servlet container's "j_security_check" which then stores the authenticated > session in App Server memory. A load-balancer without sticky-sessions > enabled will cause an unstable experience for users, in which they are > suddenly unauthenticated. > > -Does Sling already offer a mechanism for authenticating without storing > that JCR session in Servlet Container Session? > -Do any of you avoid sticky sessions without writing custom code? > > I'm thinking that this problem *must* be solved already. Either there's an > authenticationhandler in Sling that I haven't found yet, or there's an > open-source example that somebody could share with me :) > > If I must write this myself, is this the best place to start? > https://sling.apache.org/documentation/the-sling-engine/authentication/authentication-authenticationhandler.html > https://sling.apache.org/apidocs/sling8/org/apache/sling/auth/core/spi/AuthenticationHandler.html > > ... as usual, thanks guys. I realize I'm really dominating the mail list > lately. I've got a lot to solve :) > > > > > -- > View this message in context: > http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530.html > Sent from the Sling - Users mailing list archive at Nabble.com.