Re: Join new member to Sling Oak cluster?
/tasks/HistoryCleanUpTask,sli >> ng/webconsole/test, >> org.apache.sling.instance.name=Instance >> 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0, >> org.apache.sling.instance.description=Instance with id >> 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0 and run modes [oak_mongo, >> oak]}], an >> InstanceDescription[slindId=cc9e3513-8f6c-460d-b27e-dd3844aa629d, >> isLeader=false, isOwn=false, >> clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d, >> properties={org.apache.sling.instance.endpoints=, >> org.apache.sling.event.jobs.consumer.topics=/,org/apache/sling/event/ >> impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test, >> org.apache.sling.instance.name=Instance >> cc9e3513-8f6c-460d-b27e-dd3844aa629d, >> org.apache.sling.instance.description=Instance with id >> cc9e3513-8f6c-460d-b27e-dd3844aa629d and run modes [oak_mongo, >> oak]}]]]). >> 09.01.2017 16:20:37.171 *INFO* >> [DocumentDiscoveryLiteService-BackgroundWorker-[1]] >> org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteServi >> ce >> checkView: view changed from: a >> ClusterView[{"seq":3,"final":false,"id":"ae406105-3c77-4498-b8fc- >> 1557b012d46d","me":1,"active":[1],"deactivating":[2],"inactive":[]}], >> to: a >> ClusterView[{"seq":3,"final":true,"id":"ae406105-3c77-4498-b8fc- >> 1557b012d46d","me":1,"active":[1],"deactivating":[],"inactive":[2]}], >> hasInstancesWithBacklog: false >> *** >> From there onward, the log file is just the normal shut-down logs as >> OSGI >> unregisters bundles and the servlet container shuts down. >> >> >> >> -- >> View this message in context: http://apache-sling.73963.n3.nabble.com >> /Join-new-member-to-Sling-Oak-cluster-tp4069454p4069492.html >> Sent from the Sling - Users mailing list archive at Nabble.com. >
Re: Join new member to Sling Oak cluster?
Oak complains if it detects a major difference in clock times between the cluster nodes. 2017-01-14 23:20 GMT+01:00 John Logan: > Robert Munteanu wrote: > > I am unable to dig up any documentation on this from the Oak side, > > sorry. Perhaps you have better luck on oak-...@jackrabbit.apache.org . > > The most relevant online doc I could find regarding clustering was: > > http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html > > AFAICT it didn't say anything about clock synchronization, but the MVCC > revision names do have a timestamp component. > > John > -- Cheers, Jörg Hoh, http://cqdump.wordpress.com Twitter: @joerghoh
Re: Join new member to Sling Oak cluster?
Robert Munteanu wrote: > I am unable to dig up any documentation on this from the Oak side, > sorry. Perhaps you have better luck on oak-...@jackrabbit.apache.org . The most relevant online doc I could find regarding clustering was: http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html AFAICT it didn't say anything about clock synchronization, but the MVCC revision names do have a timestamp component. John
Re: Join new member to Sling Oak cluster?
Robert Munteanu-2 wrote > - Are the Sling/Oak instances and MongoDB clocks in sync? I've just realized the significance of this question. Our Sling and Mongo instances will be in different data centers entirely, Mongo provided as a service and Sling in our own AWS instances somewhere... I suppose in this distributed environment we don't have any strong guarantees about the clocks being in Sync. Is this a known requirement of clustering? -- View this message in context: http://apache-sling.73963.n3.nabble.com/Join-new-member-to-Sling-Oak-cluster-tp4069454p4069553.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Join new member to Sling Oak cluster?
- What kind of discovery mechanism do you use? I don't know - whichever is the default discovery mechanism when running org.apache.sling.launchpad-8-webapp.war on Tomcat 8 and I've modified the content of the war file so that it will start with run modes oak|oak_mongo, so that by default it's looking for mongo on localhost. - Are the Sling/Oak instances and MongoDB clocks in sync? I didn't realize this is important, but they must be as it's all in my local machine. I'm not even using vagrant or virtual machines - just a buncha stuff on local MacOS. - Do you have anything suspicious in the error logs when the instances shut themselves down? See my reply above. Nothing very indicative, just the instance declaring that the clusterview has changed and itself is no longer in it... - What happens if after the instances are shut down and a new leader is elected you restart one of the old instances? Then they boot out the other other guy. They just take turns booting each other out. To be specific, I had a sling instance on :8080 and another on :8081, and they were connected to mongo localhost. Everything worked great as I developed on them for a good 6 hours. Then I added and instance on :8082 and when it connected to mongo localhost, tomcat 8080 and 8081 both shut down, and the log file is just their mutual agreement that the new clusterview doesn't include them. I immediately started 8080 and 8081 back up again, and the same thing happened but in reverse, with 8082 shutting itself down. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Join-new-member-to-Sling-Oak-cluster-tp4069454p4069493.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Join new member to Sling Oak cluster?
Not sure if I've experienced an intermittent but severe defect, or if I did something wrong when attempting this yesterday. I followed the exact same steps today, and the new instance *did* successfully join. Either I'm wrong, and I actually did something differently yesterday, or this is an intermittent defect and a big bummer. I'll keep trying to reproduce. Here is the log output from one of the instances that was already in the cluster when the new one joined: ** 09.01.2017 16:20:36.062 *INFO* [DocumentDiscoveryLiteService-BackgroundWorker-[1]] org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteService doCheckView: view has changed from: a ClusterView[valid=true, viewSeqNum=2, clusterViewId=ae406105-3c77-4498-b8fc-1557b012d46d, activeIds=1,2, recoveringIds=null, inactiveIds=null] to: a ClusterView[valid=true, viewSeqNum=3, clusterViewId=ae406105-3c77-4498-b8fc-1557b012d46d, activeIds=1, recoveringIds=null, inactiveIds=2] - sending event... 09.01.2017 16:20:36.063 *INFO* [DocumentDiscoveryLiteService-BackgroundWorker-[1]] org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteService checkView: view changed from: a ClusterView[{"seq":2,"final":true,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1,2],"deactivating":[],"inactive":[]}], to: a ClusterView[{"seq":3,"final":false,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1],"deactivating":[2],"inactive":[]}], hasInstancesWithBacklog: true 09.01.2017 16:20:37.168 *INFO* [Thread-31] org.apache.sling.discovery.impl.DiscoveryServiceImpl enqueueForAll: sending PROPERTIES_CHANGED to all listeners (oldView=TopologyViewImpl [current=false, super.hashCode=1259039663, instances=[an InstanceDescription[slindId=3d4f31d9-b3ba-4e32-aa10-36ef150a37f0, isLeader=true, isOwn=true, clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d, properties={org.apache.sling.instance.endpoints=, org.apache.sling.event.jobs.consumer.topics=/,com/composum/sling/core/pckgmgr/PackageJobExecutor,com/composum/sling/core/script/GroovyJobExecutor,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test, org.apache.sling.instance.name=Instance 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0, org.apache.sling.instance.description=Instance with id 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0 and run modes [oak_mongo, oak]}], an InstanceDescription[slindId=cc9e3513-8f6c-460d-b27e-dd3844aa629d, isLeader=false, isOwn=false, clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d, properties={org.apache.sling.instance.endpoints=, org.apache.sling.event.jobs.consumer.topics=/,com/composum/sling/core/pckgmgr/PackageJobExecutor,com/composum/sling/core/script/GroovyJobExecutor,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test, org.apache.sling.instance.name=Instance cc9e3513-8f6c-460d-b27e-dd3844aa629d, org.apache.sling.instance.description=Instance with id cc9e3513-8f6c-460d-b27e-dd3844aa629d and run modes [oak_mongo, oak]}]]], newView=TopologyViewImpl [current=true, super.hashCode=238080023, instances=[an InstanceDescription[slindId=3d4f31d9-b3ba-4e32-aa10-36ef150a37f0, isLeader=true, isOwn=true, clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d, properties={org.apache.sling.instance.endpoints=, org.apache.sling.event.jobs.consumer.topics=/,com/composum/sling/core/pckgmgr/PackageJobExecutor,com/composum/sling/core/script/GroovyJobExecutor,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test, org.apache.sling.instance.name=Instance 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0, org.apache.sling.instance.description=Instance with id 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0 and run modes [oak_mongo, oak]}], an InstanceDescription[slindId=cc9e3513-8f6c-460d-b27e-dd3844aa629d, isLeader=false, isOwn=false, clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d, properties={org.apache.sling.instance.endpoints=, org.apache.sling.event.jobs.consumer.topics=/,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test, org.apache.sling.instance.name=Instance cc9e3513-8f6c-460d-b27e-dd3844aa629d, org.apache.sling.instance.description=Instance with id cc9e3513-8f6c-460d-b27e-dd3844aa629d and run modes [oak_mongo, oak]}]]]). 09.01.2017 16:20:37.171 *INFO* [DocumentDiscoveryLiteService-BackgroundWorker-[1]] org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteService checkView: view changed from: a ClusterView[{"seq":3,"final":false,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1],"deactivating":[2],"inactive":[]}], to: a ClusterView[{"seq":3,"final":true,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1],"deactivating":[],"inactive
Re: Join new member to Sling Oak cluster?
Hi, On Mon, 2017-01-09 at 17:52 -0700, lancedolan wrote: > Hey guys, sorry for multiple recent question. I'm biting down hard on > Sling > right now and hitting tons of learning curve and growing pains. Don't worry, we like these kinds of questions :-) > > My problem: If I create a fresh instance of MongoDB, and connect > multiple > fresh instances of Sling to it (each running in a separate tomcat > instance), > they all plug-and-play happily. They just discover each other and my > clusterview is very stable at /system/console/topology. > > However, if I used the cluster for a while (deploy some OSGI bundles, > create > some JCR content) and *then* connect a new sling instance, what > happens is > that all of the current instances shut down (they literally send a > shut down > signal to tomcat's shut down port) and then the single new instance > votes > itself as the new leader, and only member, of a new 1-instance > cluster. That sounds unintended. I am not the best person to debug this but I suppose the following information will be useful: - What kind of discovery mechanism do you use? - Are the Sling/Oak instances and MongoDB clocks in sync? - Do you have anything suspicious in the error logs when the instances shut themselves down? - What happens if after the instances are shut down and a new leader is elected you restart one of the old instances? > Is this a known issue? Do I need to "prime" my new member with the > current > state of the cluster before connecting it to the cluster or something > (perhaps by uploading all the bundles and content that has been > uploaded to > the cluster?) It's not known to me at least and to my knowledge there should be no extra steps needed. Thanks, Robert
Join new member to Sling Oak cluster?
Hey guys, sorry for multiple recent question. I'm biting down hard on Sling right now and hitting tons of learning curve and growing pains. My problem: If I create a fresh instance of MongoDB, and connect multiple fresh instances of Sling to it (each running in a separate tomcat instance), they all plug-and-play happily. They just discover each other and my clusterview is very stable at /system/console/topology. However, if I used the cluster for a while (deploy some OSGI bundles, create some JCR content) and *then* connect a new sling instance, what happens is that all of the current instances shut down (they literally send a shut down signal to tomcat's shut down port) and then the single new instance votes itself as the new leader, and only member, of a new 1-instance cluster. Is this a known issue? Do I need to "prime" my new member with the current state of the cluster before connecting it to the cluster or something (perhaps by uploading all the bundles and content that has been uploaded to the cluster?) -- View this message in context: http://apache-sling.73963.n3.nabble.com/Join-new-member-to-Sling-Oak-cluster-tp4069454.html Sent from the Sling - Users mailing list archive at Nabble.com.