Thanks for your help. I think I have it solved. The trick is that the crm tools also need to know what the Pacemaker IPC buffer size is. I have set:
/etc/sysconfig/pacemaker #export LRMD_MAX_CHILDREN="8" # Force use of a particular class of IPC connection # PCMK_ipc_type=shared-mem|socket|posix|sysv export PCMK_ipc_type=shared-mem # Specify an IPC buffer size in bytes # Useful when connecting to really big clusters that exceed the default 20k buffer # PCMK_ipc_buffer=20480 export PCMK_ipc_buffer=20480000 and ~/.bashrc export PCMK_ipc_type=shared-mem export PCMK_ipc_buffer=20480000 And now everything seems to play nicely together. A 20MB buffer seems huge but I have a TON of virtual machines on this cluster. On Fri 30 Aug 2013 01:00:36 AM EDT, Andrew Beekhof wrote: > You'd have to ask suse. > They'd know what the old and new are and therefor the differences between the > two. > > On 30/08/2013, at 2:21 PM, Tom Parker <[email protected]> wrote: > >> Do you know if this has changed significantly from the older versions? >> This cluster was working fine before the upgrade. >> >> On Fri 30 Aug 2013 12:16:35 AM EDT, Andrew Beekhof wrote: >>> >>> On 30/08/2013, at 1:42 PM, Tom Parker <[email protected]> wrote: >>> >>>> My pacemaker config contains the following settings: >>>> >>>> LRMD_MAX_CHILDREN="8" >>>> export PCMK_ipc_buffer=3172882 >>> >>> perhaps go higher >>> >>>> >>>> This is what I had today to get to 127 Resources defined. I am not sure >>>> what I should choose for the PCMK_ipc_type. Do you have any suggestions >>>> for large clusters? >>> >>> shm is the new upstream default, but it may not have propagated to suse yet. >>> >>>> >>>> Thanks >>>> >>>> Tom >>>> >>>> On 08/29/2013 11:19 PM, Andrew Beekhof wrote: >>>>> On 30/08/2013, at 5:49 AM, Tom Parker <[email protected]> >>>>> wrote: >>>>> >>>>> >>>>>> Hello. Las night I updated my SLES 11 servers to HAE-SP3 which contains >>>>>> the following versions of software: >>>>>> >>>>>> cluster-glue-1.0.11-0.15.28 >>>>>> libcorosync4-1.4.5-0.18.15 >>>>>> corosync-1.4.5-0.18.15 >>>>>> pacemaker-mgmt-2.1.2-0.7.40 >>>>>> pacemaker-mgmt-client-2.1.2-0.7.40 >>>>>> pacemaker-1.1.9-0.19.102 >>>>>> >>>>>> With the previous versions of openais/corosync I could run over 200 >>>>>> resources with no problems and with very little lag with the management >>>>>> commands (crm_mon, crm configure, etc) >>>>>> >>>>>> Today I am unable to configure more than 127 resources. When I commit >>>>>> my 128th resource all the crm commands start to fail (crm_mon just >>>>>> hangs) or timeout (ERROR: running cibadmin -Ql: Call cib_query failed >>>>>> (-62): Timer expired) >>>>>> >>>>>> I have attached my original crm config with 201 primitives to this >>>>>> e-mail. >>>>>> >>>>>> If anyone has any ideas as to what may have changed between pacemaker >>>>>> versions that would cause this please let me know. If I can't get this >>>>>> solved this week I will have to downgrade to SP2 again. >>>>>> >>>>>> Thanks for any information. >>>>>> >>>>> I suspect you've hit an IPC buffer limit. >>>>> >>>>> Depending on exactly what went into the SUSE builds, you should have the >>>>> following environment variables (documentation from >>>>> /etc/syconfig/pacemaker on RHEL) to play with: >>>>> >>>>> # Force use of a particular class of IPC connection >>>>> # PCMK_ipc_type=shared-mem|socket|posix|sysv >>>>> >>>>> # Specify an IPC buffer size in bytes >>>>> # Useful when connecting to really big clusters that exceed the default >>>>> 20k buffer >>>>> # PCMK_ipc_buffer=20480 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> >>>>> See also: >>>>> http://linux-ha.org/ReportingProblems >>>> >>> > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
