Package: pacemaker Version: 1.1.7-1 Severity: normal using pacemaker from wheezy i found on-fail settings are not honored on clones and master/slave resources, problem as been already reported to upstream and they have released a fix, i'm asking for the inclusion of the fix attached to debian.
the attached patch is upstream patch with minor (costmetic) differences in order to get apply it cleanly to debian sources. thanks! before patch: # crm resource show msPostgresql resource msPostgresql is running on: infra02 resource msPostgresql is running on: infra01 Master # crm configure show msPostgresql ms msPostgresql pgsql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" is-managed="true" # crm configure show pgsql primitive pgsql ocf:local:pgsql \ params pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" psql="/usr/bin/psql" pgdata="/var/lib/postgresql/9.1/main" start_opt="-p 5432" rep_mode="sync" node_list="infra01 infra02" restore_command="cp /var/lib/postgresql/9.1/archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="192.168.111.12" stop_escalate="0" config="/etc/postgresql/9.1/main/postgresql.conf" tmpdir="/var/lib/postgresql/tmp" pgctldata="/usr/lib/postgresql/9.1/bin/pg_controldata" repuser="repl" \ op start interval="0" timeout="120" on-fail="restart" \ op monitor interval="7" timeout="120" on-fail="stop" \ op monitor interval="2" role="Master" timeout="60" on-fail="restart" \ op promote interval="0" timeout="120" on-fail="restart" \ op demote interval="0" timeout="120" on-fail="stop" \ op stop interval="0" timeout="120" on-fail="block" \ op notify interval="0" timeout="90" # kill `cat /var/run/postgresql/9.1-main.pid ` pgsql log Apr 15 16:12:17 infra02 postgres[39723]: [2-1] 2013-04-15 16:12:17 ART LOG: received smart shutdown request Apr 15 16:12:17 infra02 postgres[39769]: [1-1] 2013-04-15 16:12:17 ART LOG: shutting down Apr 15 16:12:17 infra02 postgres[39769]: [2-1] 2013-04-15 16:12:17 ART LOG: database system is shut down cluster log Apr 15 16:12:17 infra02 pgsql[41389]: INFO: PostgreSQL is down Apr 15 16:12:17 infra02 crmd: [1441]: info: process_lrm_event: LRM operation pgsql:0_monitor_7000 (call=84, rc=7, cib-update=89, confirmed=false) not running Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_ais_dispatch: Update relayed from infra01 Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-pgsql:0 (13) Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_perform_update: Sent update 270: fail-count-pgsql:0=13 Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_ais_dispatch: Update relayed from infra01 Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-pgsql:0 (1366053137) Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_perform_update: Sent update 272: last-failure-pgsql:0=1366053137 Apr 15 16:12:17 infra02 lrmd: [1438]: info: rsc:pgsql:0 notify[85] (pid 41435) Apr 15 16:12:17 infra02 lrmd: [1438]: info: operation notify[85] on pgsql:0 for client 1441: pid 41435 exited with return code 0 Apr 15 16:12:17 infra02 crmd: [1441]: info: process_lrm_event: LRM operation pgsql:0_notify_0 (call=85, rc=0, cib-update=0, confirmed=true) ok Apr 15 16:12:17 infra02 lrmd: [1438]: info: cancel_op: operation monitor[84] on pgsql:0 for client 1441, its parameters: pgctl=[/usr/lib/postgresql/9.1/bin/pg_ctl] CRM_meta_clone=[0] config=[/etc/postgresql/9.1/main/postgresql.conf] CRM_meta_clone_max=[2] CRM_meta_globally_unique=[false] CRM_meta_notify_master_uname=[infra01 ] CRM_meta_notify_promote_uname=[ ] tmpdir=[/var/lib/postgresql/tmp] CRM_meta_notify_active_uname=[ ] start_opt=[-p 5432] CRM_meta_notify_stop_resource=[ ] CRM_meta_name=[monitor] CRM_meta_interval=[7000] CRM_meta_clone_node_max=[1] crm_fe cancelled Apr 15 16:12:17 infra02 lrmd: [1438]: info: rsc:pgsql:0 stop[86] (pid 41471) Apr 15 16:12:17 infra02 crmd: [1441]: info: process_lrm_event: LRM operation pgsql:0_monitor_7000 (call=84, status=1, cib-update=0, confirmed=true) Cancelled Apr 15 16:12:17 infra02 pgsql[41471]: INFO: PostgreSQL is already stopped. Apr 15 16:12:17 infra02 pgsql[41471]: INFO: Changing pgsql-status on infra02 : HS:alone->STOP. Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (STOP) Apr 15 16:12:17 infra02 attrd: [1439]: notice: attrd_perform_update: Sent update 274: pgsql-status=STOP Apr 15 16:12:17 infra02 lrmd: [1438]: info: operation stop[86] on pgsql:0 for client 1441: pid 41471 exited with return code 0 Apr 15 16:12:17 infra02 crmd: [1441]: info: process_lrm_event: LRM operation pgsql:0_stop_0 (call=86, rc=0, cib-update=90, confirmed=true) ok Apr 15 16:12:18 infra02 lrmd: [1438]: info: rsc:pgsql:0 start[87] (pid 41525) Apr 15 16:12:18 infra02 pgsql[41525]: INFO: Set all nodes into async mode. Apr 15 16:12:18 infra02 pgsql[41525]: INFO: My Timeline ID and Checkpoint : 7:00000000160000D0 Apr 15 16:12:18 infra02 pgsql[41525]: INFO: infra01 master baseline : 7:0000000017000070 Apr 15 16:12:18 infra02 pgsql[41525]: INFO: server starting Apr 15 16:12:18 infra02 pgsql[41525]: INFO: PostgreSQL start command sent. Apr 15 16:12:18 infra02 lrmd: [1438]: info: RA output: (pgsql:0:start:stderr) psql: could not connect to server: No such file or directory#012#011Is the server running locally and accepting#012#011connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? Apr 15 16:12:18 infra02 pgsql[41525]: WARNING: PostgreSQL template1 isn't running Apr 15 16:12:18 infra02 pgsql[41525]: WARNING: Connection error (connection to the server went bad and the session was not interactive) occurred while executing the psql command. Apr 15 16:12:19 infra02 pgsql[41525]: INFO: PostgreSQL is started. Apr 15 16:12:19 infra02 pgsql[41525]: INFO: Changing pgsql-status on infra02 : STOP->HS:alone. Apr 15 16:12:19 infra02 attrd: [1439]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (HS:alone) Apr 15 16:12:19 infra02 attrd: [1439]: notice: attrd_perform_update: Sent update 276: pgsql-status=HS:alone Apr 15 16:12:19 infra02 lrmd: [1438]: info: operation start[87] on pgsql:0 for client 1441: pid 41525 exited with return code 0 Apr 15 16:12:19 infra02 crmd: [1441]: info: process_lrm_event: LRM operation pgsql:0_start_0 (call=87, rc=0, cib-update=91, confirmed=true) ok Apr 15 16:12:19 infra02 lrmd: [1438]: info: rsc:pgsql:0 notify[88] (pid 41771) Apr 15 16:12:19 infra02 lrmd: [1438]: info: operation notify[88] on pgsql:0 for client 1441: pid 41771 exited with return code 0 Apr 15 16:12:19 infra02 crmd: [1441]: info: process_lrm_event: LRM operation pgsql:0_notify_0 (call=88, rc=0, cib-update=0, confirmed=true) ok Apr 15 16:12:19 infra02 crmd: [1441]: info: process_lrm_event: LRM operation pgsql:0_monitor_7000 (call=89, rc=0, cib-update=92, confirmed=false) ok after patch: # kill `cat /var/run/postgresql/9.1-main.pid ` cluster log Apr 16 11:21:05 infra02 pgsql[100164]: INFO: PostgreSQL is down Apr 16 11:21:05 infra02 crmd: [97198]: info: process_lrm_event: LRM operation pgsql:0_monitor_7000 (call=15, rc=7, cib-update=24, confirmed=false) not running Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_ais_dispatch: Update relayed from infra01 Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-pgsql:0 (1) Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_perform_update: Sent update 47: fail-count-pgsql:0=1 Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_ais_dispatch: Update relayed from infra01 Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-pgsql:0 (1366122065) Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_perform_update: Sent update 50: last-failure-pgsql:0=1366122065 Apr 16 11:21:05 infra02 lrmd: [97195]: info: rsc:pgsql:0 notify[24] (pid 100206) Apr 16 11:21:05 infra02 lrmd: [97195]: info: operation notify[24] on pgsql:0 for client 97198: pid 100206 exited with return code 0 Apr 16 11:21:05 infra02 crmd: [97198]: info: process_lrm_event: LRM operation pgsql:0_notify_0 (call=24, rc=0, cib-update=0, confirmed=true) ok Apr 16 11:21:05 infra02 lrmd: [97195]: info: cancel_op: operation monitor[15] on pgsql:0 for client 97198, its parameters: pgctl=[/usr/lib/postgresql/9.1/bin/pg_ctl] CRM_meta_clone=[0] config=[/etc/postgresql/9.1/main/postgresql.conf] CRM_meta_clone_max=[2] CRM_meta_globally_unique=[false] CRM_meta_notify_master_uname=[ ] CRM_meta_notify_promote_uname=[ ] tmpdir=[/var/lib/postgresql/tmp] CRM_meta_notify_active_uname=[ ] start_opt=[-p 5432] CRM_meta_notify_stop_resource=[ ] CRM_meta_name=[monitor] CRM_meta_interval=[7000] CRM_meta_clone_node_max=[1] crm_feature_ cancelled Apr 16 11:21:05 infra02 lrmd: [97195]: info: rsc:pgsql:0 stop[25] (pid 100241) Apr 16 11:21:05 infra02 crmd: [97198]: info: process_lrm_event: LRM operation pgsql:0_monitor_7000 (call=15, status=1, cib-update=0, confirmed=true) Cancelled Apr 16 11:21:05 infra02 pgsql[100241]: INFO: PostgreSQL is already stopped. Apr 16 11:21:05 infra02 pgsql[100241]: INFO: Changing pgsql-status on infra02 : HS:alone->STOP. Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (STOP) Apr 16 11:21:05 infra02 lrmd: [97195]: info: operation stop[25] on pgsql:0 for client 97198: pid 100241 exited with return code 0 Apr 16 11:21:05 infra02 attrd: [97196]: notice: attrd_perform_update: Sent update 52: pgsql-status=STOP Apr 16 11:21:05 infra02 crmd: [97198]: info: process_lrm_event: LRM operation pgsql:0_stop_0 (call=25, rc=0, cib-update=25, confirmed=true) ok -- System Information: Debian Release: 7.0 APT prefers testing APT policy: (900, 'testing'), (500, 'testing-updates'), (300, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.2.0-4-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash
Description: fixes a bug on cloned and master/slave resources handling during failures. . Author: gustavo panizzo <[email protected]> Origin: upstream, https://github.com/beekhof/pacemaker/commit/6a48a8b Bug-Debian: Forwarded: not-needed Last-Update: <2013-04-16> --- pacemaker-1.1.7.orig/lib/pengine/utils.c +++ pacemaker-1.1.7/lib/pengine/utils.c @@ -544,7 +544,6 @@ unpack_operation(action_t * action, xmlN unpack_instance_attributes(data_set->input, xml_obj, XML_TAG_ATTR_SETS, NULL, action->meta, NULL, FALSE, data_set->now); - g_hash_table_remove(action->meta, "id"); class = g_hash_table_lookup(action->rsc->meta, "class"); @@ -785,12 +784,19 @@ find_rsc_op_entry(resource_t * rsc, cons } match_key = generate_op_key(rsc->id, name, number); - if (safe_str_eq(key, match_key)) { op = operation; } crm_free(match_key); + if(rsc->clone_name) { + match_key = generate_op_key(rsc->clone_name, name, number); + if (safe_str_eq(key, match_key)) { + op = operation; + } + crm_free(match_key); + } + if (op != NULL) { crm_free(local_key); return op;

