Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread Andrew Beekhof
On 11 Nov 2013, at 5:08 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew I tested by the following versions. https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc However, the problem has not been solved yet. I do not think that this problem can

[Pacemaker] recover cib from raw file

2013-11-11 Thread s.oreilly
Hi, Is it possible to recover/replace cib.xml from one of the raw files in /var/lib/pacemaker/cib? I would like to reset the cib to the configuration referenced in cib.last. In the case cib-89.raw I haven't been able to find a command to do this. Thanks Sean O'Reilly

Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread yusuke iida
Hi, Andrew I check the log of the DC. As long as the following log is seen, change of batch-limit seems to have succeeded. Is the initial value of 0 at first. batch-limit has been changed into 16 if load becomes high. # grep throttle_get_total_job_limit pacemaker.log Nov 08 15:26:05 [2473]

[Pacemaker] stonith_admin does not work as expected

2013-11-11 Thread andreas graeper
hi, two nodes. n1 (slave) fence_2:stonith:fence_ifmib n2 (master) fence_1:stonith:fence_ifmib n1 was fenced cause suddenly not reachable. (reason still unknown) n2 stonith_admin -L - 'fence_1' n2 stonith_admin -U fence_1 timed out n2 stonith_admin -L - 'no devices found' crm_mon

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Vladislav Bogdanov
11.11.2013 09:00, Vladislav Bogdanov wrote: ... Looking at crm-fence-peer.sh script, it would determine peer state as offline immediately if node state (all of) * doesn't contain expected tag or has it set to down * has in_ccm tag set to false * has crmd tag set to anything except online On

Re: [Pacemaker] quorum lost, Multiple primaries not allowed by config

2013-11-11 Thread Mistina Michal
Hi Digimer, Thank you for the reply. I know the stonith is crutial for avoiding split-brain situations. However I don't have this option, because I am running servers on the ESX as virtual machines. There is only one option - using stonith agent fence_vmware_soap. But the connection from vmware

Re: [Pacemaker] quorum lost, Multiple primaries not allowed by config

2013-11-11 Thread Digimer
On 11/11/13 11:35, Mistina Michal wrote: Hi Digimer, Thank you for the reply. I know the stonith is crutial for avoiding split-brain situations. However I don't have this option, because I am running servers on the ESX as virtual machines. There is only one option - using stonith agent

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-11 Thread emmanuel segura
you can find the two files in the attachment Thanks 2013/11/11 Andrew Beekhof and...@beekhof.net On 9 Nov 2013, at 8:56 am, emmanuel segura emi2f...@gmail.com wrote: Hello Andrew, You can the file in the attachment. It would be very useful to know what is NULL at: 1196

Re: [Pacemaker] recover cib from raw file

2013-11-11 Thread Andrew Beekhof
On 11 Nov 2013, at 9:41 pm, s.oreilly s.orei...@linnovations.co.uk wrote: Hi, Is it possible to recover/replace cib.xml from one of the raw files in /var/lib/pacemaker/cib? I would like to reset the cib to the configuration referenced in cib.last. In the case cib-89.raw I haven't

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-11 Thread Andrew Beekhof
On 12 Nov 2013, at 7:23 am, emmanuel segura emi2f...@gmail.com wrote: you can find the two files in the attachment thanks. it looks like David fixed this a while back in: https://github.com/beekhof/pacemaker/commit/b32b60e I'll try and get a build done for fedora that includes this fix.

Re: [Pacemaker] stonith_admin does not work as expected

2013-11-11 Thread Andrew Beekhof
Impossible to comment without knowing the pacemaker version, full config, and how fence_ifmib works (I assume its a custom agent?) On 12 Nov 2013, at 1:21 am, andreas graeper agrae...@googlemail.com wrote: hi, two nodes. n1 (slave) fence_2:stonith:fence_ifmib n2 (master)

Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread Andrew Beekhof
On 11 Nov 2013, at 11:48 pm, yusuke iida yusk.i...@gmail.com wrote: Execution of the graph was also checked. Since the number of pending(s) is restricted to 16 from the middle, it is judged that batch-limit is effective. Observing here, even if a job is restricted by batch-limit, two or

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Andrew Beekhof
On 12 Nov 2013, at 10:29 am, Andrew Beekhof and...@beekhof.net wrote: On 12 Nov 2013, at 2:46 am, Vladislav Bogdanov bub...@hoster-ok.com wrote: 11.11.2013 09:00, Vladislav Bogdanov wrote: ... Looking at crm-fence-peer.sh script, it would determine peer state as offline immediately if

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Andrew Beekhof
Can you try with these two patches please? + Andrew Beekhof (4 seconds ago) fec946a: Fix: crmd: When the DC gracefully shuts down, record the new expected state into the cib (HEAD, master) + Andrew Beekhof (10 seconds ago) 740122a: Fix: crmd: When a peer expectedly shuts down, record the new

Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread yusuke iida
Hi, Andrew I'm sorry. This report was a thing when two cores were assigned to the virtual machine. https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing I'm sorry to be misleading. This is the report acquired with one core.

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-11 Thread Andrey Groshev
11.11.2013, 03:44, Andrew Beekhof and...@beekhof.net: On 8 Nov 2013, at 7:49 am, Andrey Groshev gre...@yandex.ru wrote:  Hi, PPL!  I need help. I do not understand... Why has stopped working.  This configuration work on other cluster, but on corosync1.  So... cluster postgres with

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Vladislav Bogdanov
12.11.2013 03:05, Andrew Beekhof wrote: On 12 Nov 2013, at 10:29 am, Andrew Beekhof and...@beekhof.net wrote: On 12 Nov 2013, at 2:46 am, Vladislav Bogdanov bub...@hoster-ok.com wrote: 11.11.2013 09:00, Vladislav Bogdanov wrote: ... Looking at crm-fence-peer.sh script, it would

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Vladislav Bogdanov
12.11.2013 03:15, Andrew Beekhof wrote: Can you try with these two patches please? + Andrew Beekhof (4 seconds ago) fec946a: Fix: crmd: When the DC gracefully shuts down, record the new expected state into the cib (HEAD, master) + Andrew Beekhof (10 seconds ago) 740122a: Fix: crmd: When a