Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

Simon Thompson (Research Computing - IT Services) Tue, 31 Jan 2017 12:21:30 -0800

Ah we have separate server licensed nodes in the hpc cluster (typically we have 
some stuff for config management, monitoring etc, so we license those as 
servers).


Agreed the bug should be fixed, I was meaning that we probably don't see it as 
the CES cluster is 4 nodes serving protocols (plus some other data access 
boxes).

Simon
________________________________________
From: [email protected] 
[[email protected]] on behalf of Jonathon A Anderson 
[[email protected]]
Sent: 31 January 2017 20:11
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

Simon,

This is what I’d usually do, and I’m pretty sure it’d fix the problem; but we 
only have two protocol nodes, so no good way to do quorum in a separate cluster 
of just those two.

Plus, I’d just like to see the bug fixed.

I suppose we could move the compute nodes to a separate cluster, and keep the 
protocol nodes together with the NSD servers; but then I’m back to the age-old 
question of “do I technically violate the GPFS license in order to do the right 
thing architecturally?” (Since you have to nominate GPFS servers in the 
client-only cluster to manage quorum, for nodes that only have client licenses.)

So far, we’re 100% legit, and it’d be better to stay that way.

~jonathon


On 1/31/17, 1:07 PM, "[email protected] on behalf of 
Simon Thompson (Research Computing - IT Services)" 
<[email protected] on behalf of [email protected]> 
wrote:

    We use multicluster for our environment, storage systems in a separate 
cluster to hpc nodes on a separate cluster from protocol nodes.

    According to the docs, this isn't supported, but we haven't seen any 
issues. Note unsupported as opposed to broken.

    Simon
    ________________________________________
    From: [email protected] 
[[email protected]] on behalf of Jonathon A Anderson 
[[email protected]]
    Sent: 31 January 2017 17:47
    To: gpfsug main discussion list
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

    Yeah, I searched around for places where ` tsctl shownodes up` appears in 
the GPFS code I have access to (i.e., the ksh and python stuff); but it’s only 
in CES. I suspect there just haven’t been that many people exporting CES out of 
an HPC cluster environment.

    ~jonathon


    From: <[email protected]> on behalf of Olaf Weiser 
<[email protected]>
    Reply-To: gpfsug main discussion list <[email protected]>
    Date: Tuesday, January 31, 2017 at 10:45 AM
    To: gpfsug main discussion list <[email protected]>
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

    I ll open a pmr here for my env ... the issue may hurt you in a ces env. 
only... but needs to be fixed in core gpfs.base  i thi k

    Gesendet von IBM Verse
    Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses 
to nodes ---

    Von:

    "Jonathon A Anderson" <[email protected]>

    An:

    "gpfsug main discussion list" <[email protected]>

    Datum:

    Di. 31.01.2017 17:32

    Betreff:

    Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

    ________________________________

    No, I’m having trouble getting this through DDN support because, while we 
have a GPFS server license and GRIDScaler support, apparently we don’t have 
“protocol node” support, so they’ve pushed back on supporting this as an 
overall CES-rooted effort.

    I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS 
developer, do you mind if I cite your info from here in my DDN case to get them 
to open a PMR?

    Thanks.

    ~jonathon


    From: <[email protected]> on behalf of Olaf Weiser 
<[email protected]>
    Reply-To: gpfsug main discussion list <[email protected]>
    Date: Tuesday, January 31, 2017 at 8:42 AM
    To: gpfsug main discussion list <[email protected]>
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

    ok.. so obviously ... it seems , that we have several issues..
    the 3983 characters is obviously a defect
    have you already raised a PMR , if so , can you send me the number ?




    From:        Jonathon A Anderson <[email protected]>
    To:        gpfsug main discussion list <[email protected]>
    Date:        01/31/2017 04:14 PM
    Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    Sent by:        [email protected]
    ________________________________



    The tail isn’t the issue; that’ my addition, so that I didn’t have to paste 
the hundred or so line nodelist into the thread.

    The actual command is

    tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile

    But you can see in my tailed output that the last hostname listed is 
cut-off halfway through the hostname. Less obvious in the example, but true, is 
the fact that it’s only showing the first 120 hosts, when we have 403 nodes in 
our gpfs cluster.

    [root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l
    120

    [root@sgate2 ~]# mmlscluster | grep '\-opa' | wc -l
    403

    Perhaps more explicitly, it looks like `tsctl shownodes up` can only 
transmit 3983 characters.

    [root@sgate2 ~]# tsctl shownodes up | wc -c
    3983

    Again, I’m convinced this is a bug not only because the command doesn’t 
actually produce a list of all of the up nodes in our cluster; but because the 
last name listed is incomplete.

    [root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1
    shas0260-opa.rc.int.col[root@sgate2 ~]#

    I’d continue my investigation within tsctl itself but, alas, it’s a binary 
with no source code available to me. :)

    I’m trying to get this opened as a bug / PMR; but I’m still working through 
the DDN support infrastructure. Thanks for reporting it, though.

    For the record:

    [root@sgate2 ~]# rpm -qa | grep -i gpfs
    gpfs.base-4.2.1-2.x86_64
    gpfs.msg.en_US-4.2.1-2.noarch
    gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64
    gpfs.gskit-8.0.50-57.x86_64
    gpfs.gpl-4.2.1-2.noarch
    nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64
    gpfs.ext-4.2.1-2.x86_64
    gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64
    gpfs.docs-4.2.1-2.noarch

    ~jonathon


    From: <[email protected]> on behalf of Olaf Weiser 
<[email protected]>
    Reply-To: gpfsug main discussion list <[email protected]>
    Date: Tuesday, January 31, 2017 at 1:30 AM
    To: gpfsug main discussion list <[email protected]>
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

    Hi ...same thing here.. everything after 10 nodes will be truncated..
    though I don't have an issue with it ... I 'll open a PMR .. and I 
recommend you to do the same thing.. ;-)

    the reason seems simple.. it is the "| tail" .at the end of the command.. 
.. which truncates the output to the last 10 items...

    should be easy to fix..
    cheers
    olaf





    From:        Jonathon A Anderson <[email protected]>
    To:        "[email protected]" 
<[email protected]>
    Date:        01/30/2017 11:11 PM
    Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    Sent by:        [email protected]
    ________________________________




    In trying to figure this out on my own, I’m relatively certain I’ve found a 
bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any 
chance someone in development can confirm?


    Here are the details of my investigation:


    ## GPFS is up on sgate2

    [root@sgate2 ~]# mmgetstate

    Node number  Node name        GPFS state
    ------------------------------------------
       414      sgate2-opa       active


    ## but if I tell ces to explicitly put one of our ces addresses on that 
node, it says that GPFS is down

    [root@sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node 
sgate2-opa
    mmces address move: GPFS is down on this node.
    mmces address move: Command failed. Examine previous error messages to 
determine cause.


    ## the “GPFS is down on this node” message is defined as code 109 in 
mmglobfuncs

    [root@sgate2 ~]# grep --before-context=1 "GPFS is down on this node." 
/usr/lpp/mmfs/bin/mmglobfuncs
      109 ) msgTxt=\
    "%s: GPFS is down on this node."


    ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects 
that the current node is identified as “down” by getDownCesNodeList

    [root@sgate2 ~]# grep --before-context=5 'printErrorMsg 109' 
/usr/lpp/mmfs/bin/mmcesnetmvaddress
    downNodeList=$(getDownCesNodeList)
    for downNode in $downNodeList
    do
      if [[ $toNodeName == $downNode ]]
      then
        printErrorMsg 109 "$mmcmd"


    ## getDownCesNodeList is the intersection of all ces nodes with GPFS 
cluster nodes listed in `tsctl shownodes up`

    [root@sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' 
/usr/lpp/mmfs/bin/mmcesfuncs
    function getDownCesNodeList
    {
    typeset sourceFile="mmcesfuncs.sh"
    [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x
    $mmTRACE_ENTER "$*"

    typeset upnodefile=${cmdTmpDir}upnodefile
    typeset downNodeList

    # get all CES nodes
    $sort -o $nodefile $mmfsCesNodes.dae

    $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile

    downNodeList=$($comm -23 $nodefile $upnodefile)
    print -- $downNodeList
    }  #----- end of function getDownCesNodeList --------------------


    ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its 
output is obviously and erroneously truncated

    [root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail
    shas0251-opa.rc.int.colorado.edu
    shas0252-opa.rc.int.colorado.edu
    shas0253-opa.rc.int.colorado.edu
    shas0254-opa.rc.int.colorado.edu
    shas0255-opa.rc.int.colorado.edu
    shas0256-opa.rc.int.colorado.edu
    shas0257-opa.rc.int.colorado.edu
    shas0258-opa.rc.int.colorado.edu
    shas0259-opa.rc.int.colorado.edu
    shas0260-opa.rc.int.col[root@sgate2 ~]#


    ## I expect that this is a bug in GPFS, likely related to a maximum output 
buffer for `tsctl shownodes up`.



    On 1/24/17, 12:48 PM, "Jonathon A Anderson" 
<[email protected]> wrote:

      I think I'm having the same issue described here:

      
http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html

      Any advice or further troubleshooting steps would be much appreciated. 
Full disclosure: I also have a DDN case open. (78804)

      We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to 
add two CES protocol nodes (sgate{1,2}) to serve NFS.

      Here's the steps I took:

      ---
      mmcrnodeclass protocol -N sgate1-opa,sgate2-opa
      mmcrnodeclass nfs -N sgate1-opa,sgate2-opa
      mmchconfig cesSharedRoot=/gpfs/summit/ces
      mmchcluster --ccr-enable
      mmchnode --ces-enable -N protocol
      mmces service enable NFS
      mmces service start NFS -N nfs
      mmces address add --ces-ip 10.225.71.104,10.225.71.105
      mmces address policy even-coverage
      mmces address move --rebalance
      ---

      This worked the very first time I ran it, but the CES addresses weren't 
re-distributed after restarting GPFS or a node reboot.

      Things I've tried:

      * disabling ces on the sgate nodes and re-running the above procedure
      * moving the cluster and filesystem managers to different snsd nodes
      * deleting and re-creating the cesSharedRoot directory

      Meanwhile, the following log entry appears in mmfs.log.latest every ~30s:

      ---
      Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned 
address 10.225.71.104
      Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned 
address 10.225.71.105
      Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem 
with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1
      Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 
10.225.71.104_0-_+,10.225.71.105_0-_+
      Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 
10.225.71.104_0-_+,10.225.71.105_0-_+
      ---

      Also notable, whenever I add or remove addresses now, I see this in 
mmsysmonitor.log (among a lot of other entries):

      ---
      2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without 
requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - 
Service.calculateAndUpdateState:275
      2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities 
at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333
      ---

      For the record, here's the interface I expect to get the address on 
sgate1:

      ---
      11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc 
noqueue state UP
      link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
      inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0
      valid_lft forever preferred_lft forever
      inet6 fe80::3efd:feff:fe08:a7c0/64 scope link
      valid_lft forever preferred_lft forever
      ---

      which is a bond of p2p1 and p2p2.

      ---
      6: p2p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master 
bond0 state UP qlen 1000
      link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
      7: p2p2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master 
bond0 state UP qlen 1000
      link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
      ---

      A similar bond0 exists on sgate2.

      I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a 
while trying to figure it out, but have been unsuccessful so far.



    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss



    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss






    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

Reply via email to