Hi ...same thing here.. everything after 10 nodes will be truncated..
though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-)

the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items...

should be easy to fix..
cheers
olaf





From:        Jonathon A Anderson <[email protected]>
To:        "[email protected]" <[email protected]>
Date:        01/30/2017 11:11 PM
Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
Sent by:        [email protected]




In trying to figure this out on my own, I’m relatively certain I’ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm?


Here are the details of my investigation:


## GPFS is up on sgate2

[root@sgate2 ~]# mmgetstate

Node number  Node name        GPFS state
------------------------------------------
    414      sgate2-opa       active


## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down

[root@sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa
mmces address move: GPFS is down on this node.
mmces address move: Command failed. Examine previous error messages to determine cause.


## the “GPFS is down on this node” message is defined as code 109 in mmglobfuncs

[root@sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs
   109 ) msgTxt=\
"%s: GPFS is down on this node."


## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as “down” by getDownCesNodeList

[root@sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress
 downNodeList=$(getDownCesNodeList)
 for downNode in $downNodeList
 do
   if [[ $toNodeName == $downNode ]]
   then
     printErrorMsg 109 "$mmcmd"


## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up`

[root@sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs
function getDownCesNodeList
{
 typeset sourceFile="mmcesfuncs.sh"
 [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x
 $mmTRACE_ENTER "$*"

 typeset upnodefile=${cmdTmpDir}upnodefile
 typeset downNodeList

 # get all CES nodes
 $sort -o $nodefile $mmfsCesNodes.dae

 $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile

 downNodeList=$($comm -23 $nodefile $upnodefile)
 print -- $downNodeList
}  #----- end of function getDownCesNodeList --------------------


## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated

[root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail
shas0251-opa.rc.int.colorado.edu
shas0252-opa.rc.int.colorado.edu
shas0253-opa.rc.int.colorado.edu
shas0254-opa.rc.int.colorado.edu
shas0255-opa.rc.int.colorado.edu
shas0256-opa.rc.int.colorado.edu
shas0257-opa.rc.int.colorado.edu
shas0258-opa.rc.int.colorado.edu
shas0259-opa.rc.int.colorado.edu
shas0260-opa.rc.int.col[root@sgate2 ~]#


## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`.



On 1/24/17, 12:48 PM, "Jonathon A Anderson" <[email protected]> wrote:

   I think I'm having the same issue described here:
   
   
http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html
   
   Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804)
   
   We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS.
   
   Here's the steps I took:
   
   ---
   mmcrnodeclass protocol -N sgate1-opa,sgate2-opa
   mmcrnodeclass nfs -N sgate1-opa,sgate2-opa
   mmchconfig cesSharedRoot=/gpfs/summit/ces
   mmchcluster --ccr-enable
   mmchnode --ces-enable -N protocol
   mmces service enable NFS
   mmces service start NFS -N nfs
   mmces address add --ces-ip 10.225.71.104,10.225.71.105
   mmces address policy even-coverage
   mmces address move --rebalance
   ---
   
   This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot.
   
   Things I've tried:
   
   * disabling ces on the sgate nodes and re-running the above procedure
   * moving the cluster and filesystem managers to different snsd nodes
   * deleting and re-creating the cesSharedRoot directory
   
   Meanwhile, the following log entry appears in mmfs.log.latest every ~30s:
   
   ---
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+
   ---
   
   Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries):
   
   ---
   2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275
   2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333
   ---
   
   For the record, here's the interface I expect to get the address on sgate1:
   
   ---
   11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP
   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
   inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0
   valid_lft forever preferred_lft forever
   inet6 fe80::3efd:feff:fe08:a7c0/64 scope link
   valid_lft forever preferred_lft forever
   ---
   
   which is a bond of p2p1 and p2p2.
   
   ---
   6: p2p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
   7: p2p2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
   ---
   
   A similar bond0 exists on sgate2.
   
   I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far.
   
   

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to