Hello,
Wiadomość napisana przez Vladimir Kuklin <[email protected]> w dniu 29 maj 2014, o godz. 12:09: > may be the problem is that you are using liftetime crm attributes instead of > 'reboot' ones. shadow/commit is used by us because we need transactional > behaviour in some cases. if you turn crm_shadow off, then you will experience > problems with multi-state resources and location/colocation/order > constraints. so we need to find a way to make commits transactional. there > are two ways: > 1) rewrite corosync providers to use crm_diff command and apply it instead of > shadow commit that can swallow cluster attributes sometimes In PoC i removed all cs_commit/cs_shadow, and looks that everything is working. But as you says, this can lead to problems with more complicated deployments. This need to be verified. > 2) store 'reboot' attributes instead of lifetime ones I test with —lifetime forever and reboot. No difference for cs_commit/cs_shadow fail. Moreover we need method to store GTID permanent (to support whole cluster reboot). If we want to stick to cs_commit/cs_shadow, we need other method to store GTID than crm_attribute. > > > > On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya <[email protected]> > wrote: > On 05/27/14 16:44, Bartosz Kupidura wrote: > > Hello, > > Responses inline. > > > > > > Wiadomość napisana przez Vladimir Kuklin <[email protected]> w dniu 27 > > maj 2014, o godz. 15:12: > > > >> Hi, Bartosz > >> > >> First of all, we are using openstack-dev for such discussions. > >> > >> Second, there is also Percona's RA for Percona XtraDB Cluster, which looks > >> like pretty similar, although it is written in Perl. May be we could > >> derive something useful from it. > >> > >> Next, if you are working on this stuff, let's make it as open for the > >> community as possible. There is a blueprint for Galera OCF script: > >> https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. It > >> would be awesome if you wrote down the specification and sent newer > >> galera ocf code change request to fuel-library gerrit. > > > > Sure, I will update this blueprint. > > Change request in fuel-library: https://review.openstack.org/#/c/95764/ > > That is a really nice catch, Bartosz, thank you. I believe we should > review the new OCF script thoroughly and consider omitting > cs_commits/cs_shadows as well. What would be the downsides? > > > > >> > >> Speaking of crm_attribute stuff. I am very surprised that you are saying > >> that node attributes are altered by crm shadow commit. We are using > >> similar approach in our scripts and have never faced this issue. > > > > This is probably because you update crm_attribute very rarely. And with my > > approach GTID attribute is updated every 60s on every node (3 updates in > > 60s, in standard HA setup). > > > > You can try to update any attribute in loop during deploying cluster to > > trigger fail with corosync diff. > > It sounds reasonable and we should verify it. > I've updated the statuses for related bugs and attached them to the > aforementioned blueprint as well: > https://bugs.launchpad.net/fuel/+bug/1283062/comments/7 > https://bugs.launchpad.net/fuel/+bug/1281592/comments/6 > > > > > >> > >> Corosync 2.x support is in our roadmap, but we are not sure that we will > >> use Corosync 2.x earlier than 6.x release series start. > > > > Yeah, moreover corosync CMAP is not synced between cluster nodes (or maybe > > im doing something wrong?). So we need other solution for this... > > > > We should use CMAN for Corosync 1.x, perhaps. > > >> > >> > >> On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura <[email protected]> > >> wrote: > >> Hello guys! > >> I would like to start discussion on a new resource agent for > >> galera/pacemaker. > >> > >> Main features: > >> * Support cluster boostrap > >> * Support reboot any node in cluster > >> * Support reboot whole cluster > >> * To determine which node have latest DB version, we should use galera > >> GTID (Global Transaction ID) > >> * Node with latest GTID is galera PC (primary component) in case of > >> reelection > >> * Administrator can manually set node as PC > >> > >> GTID: > >> * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE > >> ‚wsrep_local_state_uuid'' > >> * store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME > >> --lifetime $LIFETIME --name gtid --update $GTID) > >> * on every monitor/stop/start action update GTID for given node > >> * GTID can have 3 format: > >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:123 - standard cluster-id:commit-id > >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:-1 - standard non initialized > >> cluster, 00000000-0000-0000-0000-000000000000:-1 > >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:INF - commit-id manually set to > >> INF, force RA to create new cluster, with master on given node > >> > >> Check if reelection of PC is needed: > >> * (node is located in partition with quorum OR we have only 1 node > >> configured in cluster) AND galera resource is not running on any node > >> * GTID is manually set to INF on given node > >> > >> Check if given node is PC: > >> * have highest GTID in cluster, in case we have more than one node with > >> „highest” GTID, we use CRC32 to choose proper PC. > >> * GTID is manually set to INF > >> * in case node with highest GTID will not come back after cluster reboot > >> (for example disk failure) administrator should set GTID to INF on other > >> node > >> > >> I have almost ready RA: http://zynzel.spof.pl/mysql-wss > >> > >> Tested with vanila centos galera/pacemaker/corosync - OK > >> Tested with Fuel 4.1 - Fail > >> > >> > >> Fuel 4.1 with that RA will not deploy correctly, because we use > >> crm_attribute to store GTID, and in manifest we use cs_shadow/cs_commit > >> for every pacemaker resource. > >> This lead to cs_commit problem with different configuration in shadow copy > >> and running configuration (running config changed by RA). > >> "Could not commit shadow instance [..] to the CIB: Application of an > >> update diff failed” > >> > >> To solve this we can go in 2 ways: > >> 1) dont use cs_commit/cs_shadow in manifests > >> 2) store GTID in other way than crm_attribute > >> > >> IMHO 2) is better (less invasive) and we can store GTID in corosync CMAP > >> (http://www.polarhome.com/service/man/generic.php?qf=corosync-cmapctl), > >> but this require corosync 2.X > >> > >> > >> -- > >> Mailing list: https://launchpad.net/~fuel-dev > >> Post to : [email protected] > >> Unsubscribe : https://launchpad.net/~fuel-dev > >> More help : https://help.launchpad.net/ListHelp > >> > >> > >> > >> -- > >> Yours Faithfully, > >> Vladimir Kuklin, > >> Fuel Library Tech Lead, > >> Mirantis, Inc. > >> +7 (495) 640-49-04 > >> +7 (926) 702-39-68 > >> Skype kuklinvv > >> 45bk3, Vorontsovskaya Str. > >> Moscow, Russia, > >> www.mirantis.com > >> www.mirantis.ru > >> [email protected] > > > > > > > -- > Best regards, > Bogdan Dobrelya, > Skype #bogdando_at_yahoo.com > Irc #bogdando > > > > -- > Yours Faithfully, > Vladimir Kuklin, > Fuel Library Tech Lead, > Mirantis, Inc. > +7 (495) 640-49-04 > +7 (926) 702-39-68 > Skype kuklinvv > 45bk3, Vorontsovskaya Str. > Moscow, Russia, > www.mirantis.com > www.mirantis.ru > [email protected] -- Mailing list: https://launchpad.net/~fuel-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp

