On 3/31/19 5:40 AM, Rohit Saini wrote:
Looking for some help on this.

Thanks,
Rohit

Hi Rohit,

As a good start to figure out what is happening here can you please provide more detailed information such as:

1. What is the configuration of the stonith device when using IPv4 and when using IPv6? ('pcs stonith show --full' - you can obfuscate the username and password from that output, the main idea is if you are using 'hostname' or 'IP4/6 address here.

2. What does it mean 'sometime' it happens with IPv6? Is there any pattern (like every night around 3/4 am, or when there is more traffic on network, when we test XXX service, etc.) when this happens or does it looks to be happening randomly? Are there any other IPv6 issues present on system not related to cluster at time when the timeout is observed?

3. Are there any messages from from fence_ilo4 in the logs (/var/log/pacemaker.log, /var/log/cluster/corosync/corosync.log, /var/log/messages, ...) around the time when the timeout is reported that would suggest what could be happening?

4. Which version of fence_ilo4 are you using?
# rpm -qa|grep  fence-agents-ipmilan
# fence-uc-orana

===
To give you some answers your questions with information provided so far:
> 1. Why is it happening only for IPv6 ILO devices? Is this some known
> issue?
Based on the data provided it is not clear where is the issue. Could be DNS resolution, could be network issue, ...

> 2. Can we increase the timeout period "exec=20006ms" to something else.
Yes you can do that and it may hide/"resolve" the issue if the fence_ilo4 can finish monitoring in the newly set timeout. You can give it a try and increase this to 40 seconds to see if that yields a better results in your environment. While the default 20 seconds should be enough for majority of environments there might be something requiring more time in your case that demands more time. Note that this approach might just effectively hide the underlying issue. To increase the timeout you should increase it for both 'start' and 'monitor' operation, for example like this:

# pcs stonith update fence-uc-orana op start timeout=40s op monitor timeout=40s

--
Ondrej


On Thu, Mar 28, 2019 at 11:24 AM Rohit Saini <rohitsaini111.fo...@gmail.com <mailto:rohitsaini111.fo...@gmail.com>> wrote:

    Hi All,
    I am trying fence_ilo4 with same ILO device having IPv4 and IPv6
    address. I see some discrepancy in both the behaviours:

    *1. When ILO has IPv4 address*
    This is working fine and stonith resources are started immediately.

    *2. When ILO has IPv6 address*
    Starting of stonith resources is taking more than 20 seconds sometime.

    *[root@tigana ~]# pcs status*
    Cluster name: ucc
    Stack: corosync
    Current DC: tigana (version 1.1.16-12.el7-94ff4df) - partition with
    quorum
    Last updated: Wed Mar 27 00:01:37 2019
    Last change: Wed Mar 27 00:01:19 2019 by root via cibadmin on orana

    2 nodes configured
    4 resources configured

    Online: [ orana tigana ]

    Full list of resources:

      Master/Slave Set: unicloud-master [unicloud]
          Masters: [ orana ]
          Slaves: [ tigana ]
      fence-uc-orana (stonith:fence_ilo4):   FAILED orana
      fence-uc-tigana        (stonith:fence_ilo4):   Started orana

    Failed Actions:
    * fence-uc-orana_start_0 on orana 'unknown error' (1): call=32,
    status=Timed Out, exitreason='none',
         last-rc-change='Wed Mar 27 00:01:17 2019', queued=0ms,
    exec=20006ms *<<<<<<<*


    *Queries:*
    1. Why is it happening only for IPv6 ILO devices? Is this some known
    issue?
    2. Can we increase the timeout period "exec=20006ms" to something else.


    Thanks,
    Rohit

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to