Package: pacemaker Version: 1.1.16-1+deb9u1 Severity: grave X-Debbugs-CC: a...@debian.org
Hi, I am running corosync 2.4.2-3+deb9u1 with pacemaker and the last run of unattended-upgrades broke the cluster (downgrading pacemaker to 1.1.16-1 fixed it immediately). The logs contain a lot of warnings that seem to point to a permission problem, such as "Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd". I am not using ACLs so the patch should not impact my system. Here is an excerpt from the logs after the upgrade: Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_PENDING -> S_NOT_DC Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_NOT_DC -> S_PENDING Nov 12 06:26:05 cluster-1 attrd[20866]: notice: Defaulting to uname -n for the local corosync node name Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_PENDING -> S_NOT_DC Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_register' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 crmd[20868]: error: Could not add resource service to LRM cluster-1 Nov 12 06:26:06 cluster-1 crmd[20868]: error: Invalid resource definition for service Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.11" subt="request" reference="lrm_invoke-tengine-xxx-29" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="cluster-1" src="cluster-2" acl_target="hacluster" crm_user="hacluster"> Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input <crm_xml> Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input <rsc_op id="5" operation="monitor" operation_key="service:1_monitor_0" on_node="cluster-1" on_node_uuid="xxx" transition-key="xxx"> Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input <primitive id="service" long-id="service:1" class="systemd" type="service"/> Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input <attributes CRM_meta_clone="1" CRM_meta_clone_max="2" CRM_meta_clone_node_max="1" CRM_meta_globally_unique="false" CRM_meta_notify="false" CRM_meta_op_target_rc="7" CRM_meta_timeout="15000" crm_feature_set="3.0.11"/> Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input </rsc_op> Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input </crm_xml> Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input </create_request_adv> Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Resource service no longer exists in the lrmd Nov 12 06:26:06 cluster-1 crmd[20868]: error: Result of probe operation for service on cluster-1: Error Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Input I_FAIL received in state S_NOT_DC from get_lrm_resource Nov 12 06:26:06 cluster-1 crmd[20868]: notice: State transition S_NOT_DC -> S_RECOVERY Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Fast-tracking shutdown in response to errors Nov 12 06:26:06 cluster-1 crmd[20868]: error: Input I_TERMINATE received in state S_RECOVERY from do_recover Nov 12 06:26:06 cluster-1 crmd[20868]: notice: Disconnected from the LRM Nov 12 06:26:06 cluster-1 crmd[20868]: notice: Disconnected from Corosync Nov 12 06:26:06 cluster-1 crmd[20868]: error: Could not recover from internal error Nov 12 06:26:06 cluster-1 pacemakerd[20857]: error: The crmd process (20868) exited: Generic Pacemaker error (201) Nov 12 06:26:06 cluster-1 pacemakerd[20857]: notice: Respawning failed child process: crmd My corosync.conf is quite standard: totem { version: 2 cluster_name: debian token: 0 token_retransmits_before_loss_const: 10 clear_node_high_bit: yes crypto_cipher: aes256 crypto_hash: sha256 interface { ringnumber: 0 bindnetaddr: xxx mcastaddr: yyy mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: yes to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 } So is my crm configuration: node xxx: cluster-1 \ attributes standby=off node xxx: cluster-2 \ attributes standby=off primitive service systemd:service \ meta failure-timeout=30 \ op monitor interval=5 on-fail=restart timeout=15s primitive vip-1 IPaddr2 \ params ip=xxx cidr_netmask=32 \ op monitor interval=10s primitive vip-2 IPaddr2 \ params ip=xxx cidr_netmask=32 \ op monitor interval=10s clone clone_service service colocation service_vip-1 inf: vip-1 clone_service colocation service_vip-2 inf: vip-2 clone_service order kot_before_vip-1 inf: clone_service vip-1 order kot_before_vip-2 inf: clone_service vip-2 location prefer-cluster1-vip-1 vip-1 1: cluster-1 location prefer-cluster2-vip-2 vip-2 1: cluster-2 property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \ cluster-name=debian \ stonith-enabled=false \ no-quorum-policy=ignore \ cluster-recheck-interval=1m \ last-lrm-refresh=1605159600 rsc_defaults rsc-options: \ failure-timeout=5m \ migration-threshold=1
signature.asc
Description: OpenPGP digital signature