On 05/29/2014 02:06 PM, Bartosz Kupidura wrote: > Hello, > > > Wiadomość napisana przez Vladimir Kuklin <[email protected]> w dniu 29 maj > 2014, o godz. 12:09: > >> may be the problem is that you are using liftetime crm attributes instead of >> 'reboot' ones. shadow/commit is used by us because we need transactional >> behaviour in some cases. if you turn crm_shadow off, then you will >> experience problems with multi-state resources and location/colocation/order >> constraints. so we need to find a way to make commits transactional. there >> are two ways: >> 1) rewrite corosync providers to use crm_diff command and apply it instead >> of shadow commit that can swallow cluster attributes sometimes > > In PoC i removed all cs_commit/cs_shadow, and looks that everything is > working. But as you says, this can lead to problems with more complicated > deployments. > This need to be verified. > >> 2) store 'reboot' attributes instead of lifetime ones > > I test with —lifetime forever and reboot. No difference for > cs_commit/cs_shadow fail. > > Moreover we need method to store GTID permanent (to support whole cluster > reboot).
Please note, GTID could always be fetched from the /var/lib/mysql/grastate.dat at the galera node > If we want to stick to cs_commit/cs_shadow, we need other method to store > GTID than crm_attribute. WE could use a modified ocf::pacemaker:SysInfo resource. We could put GTID there and use it the similar way as I did for fencing PoC[0] (for free space monitoring) [0] https://github.com/bogdando/fuel-library-1/blob/ha_fencing_WIP/deployment/puppet/cluster/manifests/fencing_primitives.pp#L41-L70 > >> >> >> >> On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya <[email protected]> >> wrote: >> On 05/27/14 16:44, Bartosz Kupidura wrote: >>> Hello, >>> Responses inline. >>> >>> >>> Wiadomość napisana przez Vladimir Kuklin <[email protected]> w dniu 27 >>> maj 2014, o godz. 15:12: >>> >>>> Hi, Bartosz >>>> >>>> First of all, we are using openstack-dev for such discussions. >>>> >>>> Second, there is also Percona's RA for Percona XtraDB Cluster, which looks >>>> like pretty similar, although it is written in Perl. May be we could >>>> derive something useful from it. >>>> >>>> Next, if you are working on this stuff, let's make it as open for the >>>> community as possible. There is a blueprint for Galera OCF script: >>>> https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. It >>>> would be awesome if you wrote down the specification and sent newer >>>> galera ocf code change request to fuel-library gerrit. >>> >>> Sure, I will update this blueprint. >>> Change request in fuel-library: https://review.openstack.org/#/c/95764/ >> >> That is a really nice catch, Bartosz, thank you. I believe we should >> review the new OCF script thoroughly and consider omitting >> cs_commits/cs_shadows as well. What would be the downsides? >> >>> >>>> >>>> Speaking of crm_attribute stuff. I am very surprised that you are saying >>>> that node attributes are altered by crm shadow commit. We are using >>>> similar approach in our scripts and have never faced this issue. >>> >>> This is probably because you update crm_attribute very rarely. And with my >>> approach GTID attribute is updated every 60s on every node (3 updates in >>> 60s, in standard HA setup). >>> >>> You can try to update any attribute in loop during deploying cluster to >>> trigger fail with corosync diff. >> >> It sounds reasonable and we should verify it. >> I've updated the statuses for related bugs and attached them to the >> aforementioned blueprint as well: >> https://bugs.launchpad.net/fuel/+bug/1283062/comments/7 >> https://bugs.launchpad.net/fuel/+bug/1281592/comments/6 >> >> >>> >>>> >>>> Corosync 2.x support is in our roadmap, but we are not sure that we will >>>> use Corosync 2.x earlier than 6.x release series start. >>> >>> Yeah, moreover corosync CMAP is not synced between cluster nodes (or maybe >>> im doing something wrong?). So we need other solution for this... >>> >> >> We should use CMAN for Corosync 1.x, perhaps. >> >>>> >>>> >>>> On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura <[email protected]> >>>> wrote: >>>> Hello guys! >>>> I would like to start discussion on a new resource agent for >>>> galera/pacemaker. >>>> >>>> Main features: >>>> * Support cluster boostrap >>>> * Support reboot any node in cluster >>>> * Support reboot whole cluster >>>> * To determine which node have latest DB version, we should use galera >>>> GTID (Global Transaction ID) >>>> * Node with latest GTID is galera PC (primary component) in case of >>>> reelection >>>> * Administrator can manually set node as PC >>>> >>>> GTID: >>>> * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE >>>> ‚wsrep_local_state_uuid'' >>>> * store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME >>>> --lifetime $LIFETIME --name gtid --update $GTID) >>>> * on every monitor/stop/start action update GTID for given node >>>> * GTID can have 3 format: >>>> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:123 - standard cluster-id:commit-id >>>> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:-1 - standard non initialized >>>> cluster, 00000000-0000-0000-0000-000000000000:-1 >>>> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:INF - commit-id manually set to >>>> INF, force RA to create new cluster, with master on given node >>>> >>>> Check if reelection of PC is needed: >>>> * (node is located in partition with quorum OR we have only 1 node >>>> configured in cluster) AND galera resource is not running on any node >>>> * GTID is manually set to INF on given node >>>> >>>> Check if given node is PC: >>>> * have highest GTID in cluster, in case we have more than one node with >>>> „highest” GTID, we use CRC32 to choose proper PC. >>>> * GTID is manually set to INF >>>> * in case node with highest GTID will not come back after cluster reboot >>>> (for example disk failure) administrator should set GTID to INF on other >>>> node >>>> >>>> I have almost ready RA: http://zynzel.spof.pl/mysql-wss >>>> >>>> Tested with vanila centos galera/pacemaker/corosync - OK >>>> Tested with Fuel 4.1 - Fail >>>> >>>> >>>> Fuel 4.1 with that RA will not deploy correctly, because we use >>>> crm_attribute to store GTID, and in manifest we use cs_shadow/cs_commit >>>> for every pacemaker resource. >>>> This lead to cs_commit problem with different configuration in shadow copy >>>> and running configuration (running config changed by RA). >>>> "Could not commit shadow instance [..] to the CIB: Application of an >>>> update diff failed” >>>> >>>> To solve this we can go in 2 ways: >>>> 1) dont use cs_commit/cs_shadow in manifests >>>> 2) store GTID in other way than crm_attribute >>>> >>>> IMHO 2) is better (less invasive) and we can store GTID in corosync CMAP >>>> (http://www.polarhome.com/service/man/generic.php?qf=corosync-cmapctl), >>>> but this require corosync 2.X >>>> >>>> >>>> -- >>>> Mailing list: https://launchpad.net/~fuel-dev >>>> Post to : [email protected] >>>> Unsubscribe : https://launchpad.net/~fuel-dev >>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> >>>> >>>> -- >>>> Yours Faithfully, >>>> Vladimir Kuklin, >>>> Fuel Library Tech Lead, >>>> Mirantis, Inc. >>>> +7 (495) 640-49-04 >>>> +7 (926) 702-39-68 >>>> Skype kuklinvv >>>> 45bk3, Vorontsovskaya Str. >>>> Moscow, Russia, >>>> www.mirantis.com >>>> www.mirantis.ru >>>> [email protected] >>> >>> >> >> >> -- >> Best regards, >> Bogdan Dobrelya, >> Skype #bogdando_at_yahoo.com >> Irc #bogdando >> >> >> >> -- >> Yours Faithfully, >> Vladimir Kuklin, >> Fuel Library Tech Lead, >> Mirantis, Inc. >> +7 (495) 640-49-04 >> +7 (926) 702-39-68 >> Skype kuklinvv >> 45bk3, Vorontsovskaya Str. >> Moscow, Russia, >> www.mirantis.com >> www.mirantis.ru >> [email protected] > > > -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando -- Mailing list: https://launchpad.net/~fuel-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp

