Hi Imesh,
I finally got round to a proper series of tests, and here are the conclusions:
· In Stratos 4.0, after a Pacemaker driven failover, the newly Active
Stratos has lost all Cartridge Definitions.
· In current [1] Stratos 4.1, after a Pacemaker driven failover, the
newly Active Stratos:
o Has lost all Deployment Policies.
o Has lost contact with the Cartridge Agents, and all VMs are stuck with
whatever state they had before the failover.
· Note: I have not verified if Cartridge Groups are lost or not.
I include the test results below at [2] and [3]. I am concerned as to whether
4.1 is ready for GA on this basis, so though more testing is no doubt possible
(e.g. Cartridge Groups) I wanted to get this info to the list ASAP.
Thanks, Shaheed
[1] A recent build somewhere between beta 1 and beta 2, but I don’t think any
relevant fixes have been made in master.
[2] Persistence test output from Stratos 4.1. Note:
1. In the build I have, the CLI is broken for a couple of commands; these
are supplemented by direct “curl” commands further down.
2. I’ve used one of our commands to show the instances and their state for
a given application since there is not a compact JSON or convenient Startos CLI
for that.
PERSISTENCE TEST, BEFORE FAILOVER
================================
stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain | Tenant ID | Email | State | Created Date
|
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com | 1 | [email protected] | Active | Fri May 15
04:46:58 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+
stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne | 1 |
+----------------------+----------------------+
stratos> list-deployment-policies
Deployment policies found:
+-------------------+---------------+
| ID | Accessibility |
+-------------------+---------------+
| static-2-ha | 1 |
+-------------------+---------------+
| autoscale-2-10-ha | 1 |
+-------------------+---------------+
| autoscale-1-5 | 1 |
+-------------------+---------------+
| static-1 | 1 |
+-------------------+---------------+
stratos> list-application-policies
Error in listing application policies
No application policies found
stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found
stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type | Category | Name | Description
| Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy | Application | cartridge-proxy | cartridge-proxy Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
Cartridge | 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
Cartridge | 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID | Alias | Status |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+
$ curl -uadmin:admin -k -H'Content-type: application/json'
https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
$ curl -uadmin:admin -k -H'Content-type: application/json'
https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
PERSISTENCE TEST, AFTER FAILOVER
===============================
stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain | Tenant ID | Email | State | Created Date
|
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com | 1 | [email protected] | Active | Fri May 15
05:26:52 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+
stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne | 1 |
+----------------------+----------------------+
stratos> list-deployment-policies
No deployment policies found
stratos> list-application-policies
Error in listing application policies
No application policies found
stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found
stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type | Category | Name | Description
| Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy | Application | cartridge-proxy | cartridge-proxy Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm | Application | cisco-sample-vm | cisco-sample-vm Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
Cartridge | 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
Cartridge | 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si | Application | cisco-qvpc-si | cisco-qvpc-si Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf | Application | cisco-qvpc-sf | cisco-qvpc-sf Cartridge
| 1 | false |
+------------------+-------------+------------------+----------------------------+---------+--------------+
stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID | Alias | Status |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+
$ curl -uadmin:admin -k -H'Content-type: application/json'
https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
$ curl -uadmin:admin -k -H'Content-type: application/json'
https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
[3] Cartridge test output from Stratos 4.1. Note:
1. We do not use a VIP for Stratos, either for 4.0 or 4.1.
2. We expect the Cartridge Agent to use a DNS lookup when it ends up
reconnecting, and this worked just fine in Stratos 4.0.
CARTRIDGE TEST, BEFORE FAILOVER
==============================
$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1,
members 1 (Active 1)
cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
CARTRIDGE TEST, AFTER FAILOVER
=============================
$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1,
members 1 (Active 1)
cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
CARTRIDGE TEST, AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN WAIT 2
MINUTES
===================================================================================
$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1,
members 1 (Active 1)
cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
From: Imesh Gunaratne [mailto:[email protected]]
Sent: 14 May 2015 20:34
To: dev
Subject: Re: Clustered deployments of Stratos
It would be better to use the REST API to query and see whether the relevant
entities are persisted. Since data is stored in binary format in the registry
it would be difficult to query the database and verify this.
On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu)
<[email protected]<mailto:[email protected]>> wrote:
I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going
to need more specifics.
For example, what query would you recommend to look at say deployment policies
and cartridge definitions?
From: Imesh Gunaratne [mailto:[email protected]<mailto:[email protected]>]
Sent: 09 May 2015 09:08
To: dev
Subject: Re: Clustered deployments of Stratos
Yes you could refer the tables that have the prefix "REG_".
On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu)
<[email protected]<mailto:[email protected]>> wrote:
Can you suggest what tables to look at?
From: Imesh Gunaratne [mailto:[email protected]<mailto:[email protected]>]
Sent: 07 May 2015 18:00
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,
Thanks for the clarification! May be the problem is with the MySQL
active-passive configuration.
I understand that you are switching the same OpenStack volume from active node
to the passive node (when the passive node becomes active) therefore
technically it should work. May be we need to investigate this problem further
by analysing whether data is persisted properly in the active node before the
passive node becomes active.
Thanks
On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu)
<[email protected]<mailto:[email protected]>> wrote:
The data is not synchronised between the active and passive nodes. For clarity,
this is the HA model we had, much as described in the blog:
• 2 nodes, with Pacemaker in active-passive mode.
• Under Pacemaker control:
o We run MySQL in active-passive mode, using a single OpenStack volume which
we attach/reattach as the active role moves around nodes.
o As the Pacemaker moves the volume, and thus MySQL around on node failures,
ActiveMQ and Stratos are moved around too.
o Thus, everything operates in active-passive mode.
Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos
JVM on the old active node has gone with the node, and Pacemaker starts up a
new Stratos JVM on what used to be the passive node), we found that the
Cartridge Definition objects were found to be missing and, as a clumsy
workaround [1], we had to replay the stored copied of them into Stratos using
the REST API.
With Stratos 4.1, using the new object names , early indications are Deployment
Policies and Application Deployment policies are lost as the active fails over
to the passive. If anything, these objects are more likely to hit the problems
of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).
Thanks, Shaheed
[1] Clearly, this loses any changes that were not in the stored copies.
From: Imesh Gunaratne [mailto:[email protected]<mailto:[email protected]>]
Sent: 03 May 2015 06:43
To: [email protected]<mailto:[email protected]>
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,
Thanks for taking time to test this!
Just to clarify the exact problem, do you mean that data is not synchronized
between the active and passive nodes or they are not persisted in the active
node?
Thanks
On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu)
<[email protected]<mailto:[email protected]>> wrote:
I have been looking into our use of Linux HA to setup an Active-Passive
configuration. Testing indicates that in 4.1 (beta1), several objects seem not
to be persisted properly. This includes at least:
- Cartridges
- Deployment policies
Am I missing something? Is it safe to workaround this by replaying those
objects?
________________________________
From: Imesh Gunaratne [[email protected]<mailto:[email protected]>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,
Currently N-way clustering is still not possible with CC, AS & SM. We completed
the initial phase of this feature however it was not completed. You could refer
mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha
Release" for details.
However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM
in Active-Passive mode.
Thanks
On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu)
<[email protected]<mailto:[email protected]>> wrote:
Hi,
We currently try to achieve HA with Stratos using something so unpleasant that
I won’t even describe it here ☺. It has been suggested that Stratos has, for a
while now, supported a clustered mode of deployment where, given N servers:
• The SM, AS and MB operate in a N-way clustered mode
• The CEP operates in a N-way loadsharing mode
• The Cartridge Agents can react to a failure in one of the N CEPs by
failing over to one of the other N-1 remaining servers
In looking for documentation on how to set this up, I came across these two
write-ups [1] and [2]. Questions:
• Both these documents mention only using N=2. Is that still correct?
• [1] Seems recently written, and [2] is a little older but not much.
Are both documents still regarded as current?
Also, I’d love to hear any other experiences people have of running
configurations like this.
Thanks, Shaheed
[1]
https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
--
Imesh Gunaratne
Technical Lead, WSO2
Committer & PMC Member, Apache Stratos
--
Imesh Gunaratne
Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos
--
Imesh Gunaratne
Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos
--
Imesh Gunaratne
Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos
--
Imesh Gunaratne
Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos