Re: [ClusterLabs Developers] [ClusterLabs] Issue in fence_ilo4 with IPv6 ILO IPs

Ondrej Mon, 01 Apr 2019 18:53:20 -0700

On 3/31/19 5:40 AM, Rohit Saini wrote:

Looking for some help on this.


Thanks,
Rohit


Hi Rohit,

As a good start to figure out what is happening here can you pleaseprovide more detailed information such as:

1. What is the configuration of the stonith device when using IPv4 andwhen using IPv6? ('pcs stonith show --full' - you can obfuscate theusername and password from that output, the main idea is if you areusing 'hostname' or 'IP4/6 address here.

2. What does it mean 'sometime' it happens with IPv6? Is there anypattern (like every night around 3/4 am, or when there is more trafficon network, when we test XXX service, etc.) when this happens or does itlooks to be happening randomly? Are there any other IPv6 issues presenton system not related to cluster at time when the timeout is observed?

3. Are there any messages from from fence_ilo4 in the logs(/var/log/pacemaker.log, /var/log/cluster/corosync/corosync.log,/var/log/messages, ...) around the time when the timeout is reportedthat would suggest what could be happening?


4. Which version of fence_ilo4 are you using?
# rpm -qa|grep  fence-agents-ipmilan
# fence-uc-orana

===
To give you some answers your questions with information provided so far:
> 1. Why is it happening only for IPv6 ILO devices? Is this some known
> issue?

Based on the data provided it is not clear where is the issue. Could beDNS resolution, could be network issue, ...


> 2. Can we increase the timeout period "exec=20006ms" to something else.

Yes you can do that and it may hide/"resolve" the issue if thefence_ilo4 can finish monitoring in the newly set timeout. You can giveit a try and increase this to 40 seconds to see if that yields a betterresults in your environment. While the default 20 seconds should beenough for majority of environments there might be something requiringmore time in your case that demands more time. Note that this approachmight just effectively hide the underlying issue.To increase the timeout you should increase it for both 'start' and'monitor' operation, for example like this:

# pcs stonith update fence-uc-orana op start timeout=40s op monitortimeout=40s


--
Ondrej

On Thu, Mar 28, 2019 at 11:24 AM Rohit Saini<rohitsaini111.fo...@gmail.com <mailto:rohitsaini111.fo...@gmail.com>>wrote:


    Hi All,
    I am trying fence_ilo4 with same ILO device having IPv4 and IPv6
    address. I see some discrepancy in both the behaviours:

    *1. When ILO has IPv4 address*
    This is working fine and stonith resources are started immediately.

    *2. When ILO has IPv6 address*
    Starting of stonith resources is taking more than 20 seconds sometime.

    *[root@tigana ~]# pcs status*
    Cluster name: ucc
    Stack: corosync
    Current DC: tigana (version 1.1.16-12.el7-94ff4df) - partition with
    quorum
    Last updated: Wed Mar 27 00:01:37 2019
    Last change: Wed Mar 27 00:01:19 2019 by root via cibadmin on orana

    2 nodes configured
    4 resources configured

    Online: [ orana tigana ]

    Full list of resources:

      Master/Slave Set: unicloud-master [unicloud]
          Masters: [ orana ]
          Slaves: [ tigana ]
      fence-uc-orana (stonith:fence_ilo4):   FAILED orana
      fence-uc-tigana        (stonith:fence_ilo4):   Started orana

    Failed Actions:
    * fence-uc-orana_start_0 on orana 'unknown error' (1): call=32,
    status=Timed Out, exitreason='none',
         last-rc-change='Wed Mar 27 00:01:17 2019', queued=0ms,
    exec=20006ms *<<<<<<<*


    *Queries:*
    1. Why is it happening only for IPv6 ILO devices? Is this some known
    issue?
    2. Can we increase the timeout period "exec=20006ms" to something else.


    Thanks,
    Rohit


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] [ClusterLabs] Issue in fence_ilo4 with IPv6 ILO IPs

Reply via email to