Re: Join new member to Sling Oak cluster?

2017-01-25 Thread Stefan Egli
/tasks/HistoryCleanUpTask,sli
>> ng/webconsole/test,
>> org.apache.sling.instance.name=Instance
>> 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0,
>> org.apache.sling.instance.description=Instance with id
>> 3d4f31d9-b3ba-4e32-aa10-36ef150a37f0 and run modes [oak_mongo,
>> oak]}], an
>> InstanceDescription[slindId=cc9e3513-8f6c-460d-b27e-dd3844aa629d,
>> isLeader=false, isOwn=false,
>> clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d,
>> properties={org.apache.sling.instance.endpoints=,
>> org.apache.sling.event.jobs.consumer.topics=/,org/apache/sling/event/
>> impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test,
>> org.apache.sling.instance.name=Instance
>> cc9e3513-8f6c-460d-b27e-dd3844aa629d,
>> org.apache.sling.instance.description=Instance with id
>> cc9e3513-8f6c-460d-b27e-dd3844aa629d and run modes [oak_mongo,
>> oak]}]]]).
>> 09.01.2017 16:20:37.171 *INFO*
>> [DocumentDiscoveryLiteService-BackgroundWorker-[1]]
>> org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteServi
>> ce
>> checkView: view changed from: a
>> ClusterView[{"seq":3,"final":false,"id":"ae406105-3c77-4498-b8fc-
>> 1557b012d46d","me":1,"active":[1],"deactivating":[2],"inactive":[]}],
>> to: a
>> ClusterView[{"seq":3,"final":true,"id":"ae406105-3c77-4498-b8fc-
>> 1557b012d46d","me":1,"active":[1],"deactivating":[],"inactive":[2]}],
>> hasInstancesWithBacklog: false
>> ***
>>  From there onward, the log file is just the normal shut-down logs as
>> OSGI
>> unregisters bundles and the servlet container shuts down.
>> 
>> 
>> 
>> --
>> View this message in context: http://apache-sling.73963.n3.nabble.com
>> /Join-new-member-to-Sling-Oak-cluster-tp4069454p4069492.html
>> Sent from the Sling - Users mailing list archive at Nabble.com.
>




Re: Join new member to Sling Oak cluster?

2017-01-14 Thread Jörg Hoh
Oak complains if it detects a major difference in clock times between the
cluster nodes.

2017-01-14 23:20 GMT+01:00 John Logan :

> Robert Munteanu wrote:
> > I am unable to dig up any documentation on this from the Oak side,
> > sorry. Perhaps you have better luck on oak-...@jackrabbit.apache.org .
>
> The most relevant online doc I could find regarding clustering was:
>
> http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html
>
> AFAICT it didn't say anything about clock synchronization, but the MVCC
> revision names do have a timestamp component.
>
> John
>



-- 
Cheers,
Jörg Hoh,

http://cqdump.wordpress.com
Twitter: @joerghoh


Re: Join new member to Sling Oak cluster?

2017-01-14 Thread John Logan
Robert Munteanu wrote:
> I am unable to dig up any documentation on this from the Oak side,
> sorry. Perhaps you have better luck on oak-...@jackrabbit.apache.org .

The most relevant online doc I could find regarding clustering was:

http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html

AFAICT it didn't say anything about clock synchronization, but the MVCC
revision names do have a timestamp component.

John


Re: Join new member to Sling Oak cluster?

2017-01-12 Thread lancedolan
Robert Munteanu-2 wrote
> - Are the Sling/Oak instances and MongoDB clocks in sync? 

I've just realized the significance of this question. Our Sling and Mongo
instances will be in different data centers entirely, Mongo provided as a
service and Sling in our own AWS instances somewhere... I suppose in this
distributed environment we don't have any strong guarantees about the clocks
being in Sync. Is this a known requirement of clustering? 



--
View this message in context: 
http://apache-sling.73963.n3.nabble.com/Join-new-member-to-Sling-Oak-cluster-tp4069454p4069553.html
Sent from the Sling - Users mailing list archive at Nabble.com.


Re: Join new member to Sling Oak cluster?

2017-01-10 Thread lancedolan
- What kind of discovery mechanism do you use? 

I don't know - whichever is the default discovery mechanism when running
org.apache.sling.launchpad-8-webapp.war on Tomcat 8 and I've modified the
content of the war file so that it will start with run modes oak|oak_mongo,
so that by default it's looking for mongo on localhost.

- Are the Sling/Oak instances and MongoDB clocks in sync? 
I didn't realize this is important, but they must be as it's all in my local
machine. I'm not even using vagrant or virtual machines - just a buncha
stuff on local MacOS.

- Do you have anything suspicious in the error logs when the instances 
shut themselves down? 
See my reply above. Nothing very indicative, just the instance declaring
that the clusterview has changed and itself is no longer in it...

- What happens if after the instances are shut down and a new leader is 
elected you restart one of the old instances? 
Then they boot out the other other guy. They just take turns booting each
other out. To be specific, I had a sling instance on :8080 and another on
:8081, and they were connected to mongo localhost. Everything worked great
as I developed on them for a good 6 hours. Then I added and instance on
:8082 and when it connected to mongo localhost, tomcat 8080 and 8081 both
shut down, and the log file is just their mutual agreement that the new
clusterview doesn't include them. I immediately started 8080 and 8081 back
up again, and the same thing happened but in reverse, with 8082 shutting
itself down.



--
View this message in context: 
http://apache-sling.73963.n3.nabble.com/Join-new-member-to-Sling-Oak-cluster-tp4069454p4069493.html
Sent from the Sling - Users mailing list archive at Nabble.com.


Re: Join new member to Sling Oak cluster?

2017-01-10 Thread lancedolan
Not sure if I've experienced an intermittent but severe defect, or if I did
something wrong when attempting this yesterday.

I followed the exact same steps today, and the new instance *did*
successfully join. Either I'm wrong, and I actually did something
differently yesterday, or this is an intermittent defect and a big bummer.
I'll keep trying to reproduce. 

Here is the log output from one of the instances that was already in the
cluster when the new one joined:

**
09.01.2017 16:20:36.062 *INFO*
[DocumentDiscoveryLiteService-BackgroundWorker-[1]]
org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteService
doCheckView: view has changed from: a ClusterView[valid=true, viewSeqNum=2,
clusterViewId=ae406105-3c77-4498-b8fc-1557b012d46d, activeIds=1,2,
recoveringIds=null, inactiveIds=null] to: a ClusterView[valid=true,
viewSeqNum=3, clusterViewId=ae406105-3c77-4498-b8fc-1557b012d46d,
activeIds=1, recoveringIds=null, inactiveIds=2] - sending event...
09.01.2017 16:20:36.063 *INFO*
[DocumentDiscoveryLiteService-BackgroundWorker-[1]]
org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteService
checkView: view changed from: a
ClusterView[{"seq":2,"final":true,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1,2],"deactivating":[],"inactive":[]}],
to: a
ClusterView[{"seq":3,"final":false,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1],"deactivating":[2],"inactive":[]}],
hasInstancesWithBacklog: true
09.01.2017 16:20:37.168 *INFO* [Thread-31]
org.apache.sling.discovery.impl.DiscoveryServiceImpl enqueueForAll: sending
PROPERTIES_CHANGED to all listeners (oldView=TopologyViewImpl
[current=false, super.hashCode=1259039663, instances=[an
InstanceDescription[slindId=3d4f31d9-b3ba-4e32-aa10-36ef150a37f0,
isLeader=true, isOwn=true,
clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d,
properties={org.apache.sling.instance.endpoints=,
org.apache.sling.event.jobs.consumer.topics=/,com/composum/sling/core/pckgmgr/PackageJobExecutor,com/composum/sling/core/script/GroovyJobExecutor,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test,
org.apache.sling.instance.name=Instance
3d4f31d9-b3ba-4e32-aa10-36ef150a37f0,
org.apache.sling.instance.description=Instance with id
3d4f31d9-b3ba-4e32-aa10-36ef150a37f0 and run modes [oak_mongo, oak]}], an
InstanceDescription[slindId=cc9e3513-8f6c-460d-b27e-dd3844aa629d,
isLeader=false, isOwn=false,
clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d,
properties={org.apache.sling.instance.endpoints=,
org.apache.sling.event.jobs.consumer.topics=/,com/composum/sling/core/pckgmgr/PackageJobExecutor,com/composum/sling/core/script/GroovyJobExecutor,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test,
org.apache.sling.instance.name=Instance
cc9e3513-8f6c-460d-b27e-dd3844aa629d,
org.apache.sling.instance.description=Instance with id
cc9e3513-8f6c-460d-b27e-dd3844aa629d and run modes [oak_mongo, oak]}]]],
newView=TopologyViewImpl [current=true, super.hashCode=238080023,
instances=[an
InstanceDescription[slindId=3d4f31d9-b3ba-4e32-aa10-36ef150a37f0,
isLeader=true, isOwn=true,
clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d,
properties={org.apache.sling.instance.endpoints=,
org.apache.sling.event.jobs.consumer.topics=/,com/composum/sling/core/pckgmgr/PackageJobExecutor,com/composum/sling/core/script/GroovyJobExecutor,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test,
org.apache.sling.instance.name=Instance
3d4f31d9-b3ba-4e32-aa10-36ef150a37f0,
org.apache.sling.instance.description=Instance with id
3d4f31d9-b3ba-4e32-aa10-36ef150a37f0 and run modes [oak_mongo, oak]}], an
InstanceDescription[slindId=cc9e3513-8f6c-460d-b27e-dd3844aa629d,
isLeader=false, isOwn=false,
clusterViewId=20a0797b-e957-4b8e-845b-9b9e0b6a246d,
properties={org.apache.sling.instance.endpoints=,
org.apache.sling.event.jobs.consumer.topics=/,org/apache/sling/event/impl/jobs/tasks/HistoryCleanUpTask,sling/webconsole/test,
org.apache.sling.instance.name=Instance
cc9e3513-8f6c-460d-b27e-dd3844aa629d,
org.apache.sling.instance.description=Instance with id
cc9e3513-8f6c-460d-b27e-dd3844aa629d and run modes [oak_mongo, oak]}]]]).
09.01.2017 16:20:37.171 *INFO*
[DocumentDiscoveryLiteService-BackgroundWorker-[1]]
org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteService
checkView: view changed from: a
ClusterView[{"seq":3,"final":false,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1],"deactivating":[2],"inactive":[]}],
to: a
ClusterView[{"seq":3,"final":true,"id":"ae406105-3c77-4498-b8fc-1557b012d46d","me":1,"active":[1],"deactivating":[],"inactive

Re: Join new member to Sling Oak cluster?

2017-01-10 Thread Robert Munteanu
Hi,

On Mon, 2017-01-09 at 17:52 -0700, lancedolan wrote:
> Hey guys, sorry for multiple recent question. I'm biting down hard on
> Sling
> right now and hitting tons of learning curve and growing pains.

Don't worry, we like these kinds of questions :-)

> 
> My problem: If I create a fresh instance of MongoDB, and connect
> multiple
> fresh instances of Sling to it (each running in a separate tomcat
> instance),
> they all plug-and-play happily. They just discover each other and my
> clusterview is very stable at /system/console/topology.
> 
> However, if I used the cluster for a while (deploy some OSGI bundles,
> create
> some JCR content) and *then* connect a new sling instance, what
> happens is
> that all of the current instances shut down (they literally send a
> shut down
> signal to tomcat's shut down port) and then the single new instance
> votes
> itself as the new leader, and only member, of a new 1-instance
> cluster.

That sounds unintended. I am not the best person to debug this but I
suppose the following information will be useful:

- What kind of discovery mechanism do you use?
- Are the Sling/Oak instances and MongoDB clocks in sync?
- Do you have anything suspicious in the error logs when the instances
shut themselves down?
- What happens if after the instances are shut down and a new leader is
elected you restart one of the old instances?

> Is this a known issue? Do I need to "prime" my new member with the
> current
> state of the cluster before connecting it to the cluster or something
> (perhaps by uploading all the bundles and content that has been
> uploaded to
> the cluster?)

It's not known to me at least and to my knowledge there should be no
extra steps needed.

Thanks,

Robert


Join new member to Sling Oak cluster?

2017-01-09 Thread lancedolan
Hey guys, sorry for multiple recent question. I'm biting down hard on Sling
right now and hitting tons of learning curve and growing pains.

My problem: If I create a fresh instance of MongoDB, and connect multiple
fresh instances of Sling to it (each running in a separate tomcat instance),
they all plug-and-play happily. They just discover each other and my
clusterview is very stable at /system/console/topology.

However, if I used the cluster for a while (deploy some OSGI bundles, create
some JCR content) and *then* connect a new sling instance, what happens is
that all of the current instances shut down (they literally send a shut down
signal to tomcat's shut down port) and then the single new instance votes
itself as the new leader, and only member, of a new 1-instance cluster.

Is this a known issue? Do I need to "prime" my new member with the current
state of the cluster before connecting it to the cluster or something
(perhaps by uploading all the bundles and content that has been uploaded to
the cluster?)



--
View this message in context: 
http://apache-sling.73963.n3.nabble.com/Join-new-member-to-Sling-Oak-cluster-tp4069454.html
Sent from the Sling - Users mailing list archive at Nabble.com.