Hi Chris,
I've seen it several times too - can u please address this by a regular PMR
thx in advance
cheers
olaf



From:        "Fey, Christian" <[email protected]>
To:        gpfsug main discussion list <[email protected]>
Date:        04/10/2017 06:04 PM
Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
Sent by:        [email protected]




Hi,

I'm just dealing with a maybe similar issue that also seems to be related to the output of "tsctl shownodes up" (before CES i actually never had to do with this command).

In my case the output of a "mmlscluster" for example shows the nodes like "node1.acme.local" but in " tsctl shownodes up" they are displayed as "node1.acme.local.acme.local" for example.

This maybe causes a fresh CES implementation in a existing GPFS cluster to also not spread ip-adresses. It instead loops in the same way as it did in your case @jonathon. I think it tries to search for "node1.acme.local" but doesn't find it since tsctl shows it with doubled suffix.

Can anyone explain, from where the "tsctl shownodes up" reads the data? Additionally does anyone have an idea why the dns suffix is doubled?


Kind regards
Christian

-----Ursprüngliche Nachricht-----
Von: [email protected] [
mailto:[email protected]] Im Auftrag von Jonathon A Anderson
Gesendet: Donnerstag, 23. März 2017 16:02
An: gpfsug main discussion list <[email protected]>
Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

Achtung! Die Absender-Adresse ist möglicherweise gefälscht. Bitte überprüfen Sie die Plausibilität der Email und lassen bei enthaltenen Anhängen und Links besondere Vorsicht walten.
Wenden Sie sich im Zweifelsfall an das CIT unter [email protected] oder 06122 536 350.
(Stichwort: DKIM Test Fehlgeschlagen)

----------------------------------------------------------------------------------------------------------------

Thanks! I’m looking forward to upgrading our CES nodes and resuming work on the project.

~jonathon


On 3/23/17, 8:24 AM, "[email protected] on behalf of Olaf Weiser" <[email protected] on behalf of [email protected]> wrote:

   the issue is fixed,
   an APAR will be released soon - IV93100
   
   
   
   From:        Olaf Weiser/Germany/IBM@IBMDE
   To:        "gpfsug main discussion list" <[email protected]>
   Cc:        "gpfsug main discussion list" <[email protected]>
   Date:        01/31/2017 11:47 PM
   Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   Sent by:        [email protected]
   ________________________________________
   
   
   
   Yeah... depending on the #nodes you 're affected or not. .....
   So if your remote ces  cluster is small enough in terms of the #nodes ... you'll neuer hit into this issue  
   
   Gesendet von IBM Verse
   
   Simon Thompson (Research Computing - IT Services) --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ---
   
   Von:"Simon Thompson (Research Computing - IT Services)" <[email protected]>An:"gpfsug main discussion list" <[email protected]>Datum:Di. 31.01.2017 21:07Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   ________________________________________
   
   We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes.
   
   According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken.
   
   Simon
   ________________________________________
   From: [email protected] [[email protected]] on behalf of Jonathon A Anderson [[email protected]]
   Sent: 31 January 2017 17:47
   To: gpfsug main discussion list
   Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   
   Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it’s only in CES. I suspect there just haven’t been that many people exporting CES out of an HPC cluster environment.
   
   ~jonathon
   
   
   From: <[email protected]> on behalf of Olaf Weiser <[email protected]>
   Reply-To: gpfsug main discussion list <[email protected]>
   Date: Tuesday, January 31, 2017 at 10:45 AM
   To: gpfsug main discussion list <[email protected]>
   Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   
   I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base  i thi k
   
   Gesendet von IBM Verse
   Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ---
   
   Von:
   
   "Jonathon A Anderson" <[email protected]>
   
   An:
   
   "gpfsug main discussion list" <[email protected]>
   
   Datum:
   
   Di. 31.01.2017 17:32
   
   Betreff:
   
   Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   
   ________________________________
   
   No, I’m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don’t have “protocol node” support, so they’ve pushed back on supporting this as an overall CES-rooted effort.
   
   I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR?
   
   Thanks.
   
   ~jonathon
   
   
   From: <[email protected]> on behalf of Olaf Weiser <[email protected]>
   Reply-To: gpfsug main discussion list <[email protected]>
   Date: Tuesday, January 31, 2017 at 8:42 AM
   To: gpfsug main discussion list <[email protected]>
   Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   
   ok.. so obviously ... it seems , that we have several issues..
   the 3983 characters is obviously a defect
   have you already raised a PMR , if so , can you send me the number ?
   
   
   
   
   From:        Jonathon A Anderson <[email protected]>
   To:        gpfsug main discussion list <[email protected]>
   Date:        01/31/2017 04:14 PM
   Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   Sent by:        [email protected]
   ________________________________
   
   
   
   The tail isn’t the issue; that’ my addition, so that I didn’t have to paste the hundred or so line nodelist into the thread.
   
   The actual command is
   
   tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile
   
   But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it’s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster.
   
   [root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l
   120
   
   [root@sgate2 ~]# mmlscluster | grep '\-opa' | wc -l
   403
   
   Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters.
   
   [root@sgate2 ~]# tsctl shownodes up | wc -c
   3983
   
   Again, I’m convinced this is a bug not only because the command doesn’t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete.
   
   [root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1
   shas0260-opa.rc.int.col[root@sgate2 ~]#
   
   I’d continue my investigation within tsctl itself but, alas, it’s a binary with no source code available to me. :)
   
   I’m trying to get this opened as a bug / PMR; but I’m still working through the DDN support infrastructure. Thanks for reporting it, though.
   
   For the record:
   
   [root@sgate2 ~]# rpm -qa | grep -i gpfs
   gpfs.base-4.2.1-2.x86_64
   gpfs.msg.en_US-4.2.1-2.noarch
   gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64
   gpfs.gskit-8.0.50-57.x86_64
   gpfs.gpl-4.2.1-2.noarch
   nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64
   gpfs.ext-4.2.1-2.x86_64
   gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64
   gpfs.docs-4.2.1-2.noarch
   
   ~jonathon
   
   
   From: <[email protected]> on behalf of Olaf Weiser <[email protected]>
   Reply-To: gpfsug main discussion list <[email protected]>
   Date: Tuesday, January 31, 2017 at 1:30 AM
   To: gpfsug main discussion list <[email protected]>
   Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   
   Hi ...same thing here.. everything after 10 nodes will be truncated..
   though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-)
   
   the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items...
   
   should be easy to fix..
   cheers
   olaf
   
   
   
   
   
   From:        Jonathon A Anderson <[email protected]>
   To:        "[email protected]" <[email protected]>
   Date:        01/30/2017 11:11 PM
   Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
   Sent by:        [email protected]
   ________________________________
   
   
   
   
   In trying to figure this out on my own, I’m relatively certain I’ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm?
   
   
   Here are the details of my investigation:
   
   
   ## GPFS is up on sgate2
   
   [root@sgate2 ~]# mmgetstate
   
   Node number  Node name        GPFS state
   ------------------------------------------
     414      sgate2-opa       active
   
   
   ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down
   
   [root@sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa
   mmces address move: GPFS is down on this node.
   mmces address move: Command failed. Examine previous error messages to determine cause.
   
   
   ## the “GPFS is down on this node” message is defined as code 109 in mmglobfuncs
   
   [root@sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs
    109 ) msgTxt=\
   "%s: GPFS is down on this node."
   
   
   ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as “down” by getDownCesNodeList
   
   [root@sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress
   downNodeList=$(getDownCesNodeList)
   for downNode in $downNodeList
   do
    if [[ $toNodeName == $downNode ]]
    then
      printErrorMsg 109 "$mmcmd"
   
   
   ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up`
   
   [root@sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs
   function getDownCesNodeList
   {
   typeset sourceFile="mmcesfuncs.sh"
   [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] &&set -x
   $mmTRACE_ENTER "$*"
   
   typeset upnodefile=${cmdTmpDir}upnodefile
   typeset downNodeList
   
   # get all CES nodes
   $sort -o $nodefile $mmfsCesNodes.dae
   
   $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile
   
   downNodeList=$($comm -23 $nodefile $upnodefile)
   print -- $downNodeList
   }  #----- end of function getDownCesNodeList --------------------
   
   
   ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated
   
   [root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail
   shas0251-opa.rc.int.colorado.edu
   shas0252-opa.rc.int.colorado.edu
   shas0253-opa.rc.int.colorado.edu
   shas0254-opa.rc.int.colorado.edu
   shas0255-opa.rc.int.colorado.edu
   shas0256-opa.rc.int.colorado.edu
   shas0257-opa.rc.int.colorado.edu
   shas0258-opa.rc.int.colorado.edu
   shas0259-opa.rc.int.colorado.edu
   shas0260-opa.rc.int.col[root@sgate2 ~]#
   
   
   ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`.
   
   
   
   On 1/24/17, 12:48 PM, "Jonathon A Anderson" <[email protected]> wrote:
   
    I think I'm having the same issue described here:
   
   
http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html
   
    Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804)
   
    We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS.
   
    Here's the steps I took:
   
    ---
    mmcrnodeclass protocol -N sgate1-opa,sgate2-opa
    mmcrnodeclass nfs -N sgate1-opa,sgate2-opa
    mmchconfig cesSharedRoot=/gpfs/summit/ces
    mmchcluster --ccr-enable
    mmchnode --ces-enable -N protocol
    mmces service enable NFS
    mmces service start NFS -N nfs
    mmces address add --ces-ip 10.225.71.104,10.225.71.105
    mmces address policy even-coverage
    mmces address move --rebalance
    ---
   
    This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot.
   
    Things I've tried:
   
    * disabling ces on the sgate nodes and re-running the above procedure
    * moving the cluster and filesystem managers to different snsd nodes
    * deleting and re-creating the cesSharedRoot directory
   
    Meanwhile, the following log entry appears in mmfs.log.latest every ~30s:
   
    ---
    Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104
    Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105
    Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1
    Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+
    Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+
    ---
   
    Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries):
   
    ---
    2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275
    2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333
    ---
   
    For the record, here's the interface I expect to get the address on sgate1:
   
    ---
    11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP
    link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
    inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0
    valid_lft forever preferred_lft forever
    inet6 fe80::3efd:feff:fe08:a7c0/64 scope link
    valid_lft forever preferred_lft forever
    ---
   
    which is a bond of p2p1 and p2p2.
   
    ---
    6: p2p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
    link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
    7: p2p2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
    link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
    ---
   
    A similar bond0 exists on sgate2.
   
    I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far.
   
   
   
   _______________________________________________
   gpfsug-discuss mailing list
   gpfsug-discuss at spectrumscale.org
   
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
   
   
   
   _______________________________________________
   gpfsug-discuss mailing list
   gpfsug-discuss at spectrumscale.org
   
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
   
   
   
   
   
   
   _______________________________________________
   gpfsug-discuss mailing list
   gpfsug-discuss at spectrumscale.org
   
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
   
   _______________________________________________
   gpfsug-discuss mailing list
   gpfsug-discuss at spectrumscale.org
   
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
   
   
   
   
   

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[attachment "smime.p7s" deleted by Olaf Weiser/Germany/IBM] _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to