Re: [gpfsug-discuss] mmdiag output questions

Daniel Metcalfe Thu, 11 Sep 2014 13:19:50 -0700

Hi Luke,



I've seen the same apparent grouping of nodes, I don't believe the nodes are 
actually being grouped but instead the "Device Bond0:" and column headers are 
being re-printed to screen whenever there is a node that has the "init" status 
followed by a node that is "connected".  It is something I've noticed on many 
different versions of GPFS so I imagine it's a "feature".



I've not noticed anything but '0' in the err column so I'm not sure if these 
correspond to error codes in the GPFS logs.  If you run the command "mmfsadm 
dump tscomm", you'll see a bit more detail than the mmdiag -network shows.  
This suggests the sock column is number of sockets. I've seen the low numbers 
to for sent / recv using mmdiag --network, again the mmfsadm command above 
gives a better representation I've found.



All that being said, if you want to get in touch with us then we'll happily 
open a PMR for you and find out the answer to any of your questions.







Kind regards,


Danny Metcalfe
Systems Engineer
OCF plc

Tel: 0114 257 2200


[cid:image001.jpg@01CFCE04.575B8380]


Twitter<http://twitter.com/ocfplc>

Fax: 0114 257 0022

[cid:image002.jpg@01CFCE04.575B8380]

Blog<http://blog.ocf.co.uk/>

Mob: 07960 503404

[cid:image003.jpg@01CFCE04.575B8380]

Web<http://www.ocf.co.uk/>



Please note, any emails relating to an OCF Support request must always be sent 
to supp...@ocf.co.uk<mailto:supp...@ocf.co.uk> for a ticket number to be 
generated or existing support ticket to be updated. Should this not be done 
then OCF cannot be held responsible for requests not dealt with in a timely 
manner.

OCF plc is a company registered in England and Wales.  Registered number 
4132533. Registered office address: OCF plc, 5 Rotunda Business Centre, 
Thorncliffe Park, Chapeltown, Sheffield, S35 2PG

This message is private and confidential. If you have received this message in 
error, please notify us immediately and remove it from your system.


-----Original Message-----
From: gpfsug-discuss-boun...@gpfsug.org 
[mailto:gpfsug-discuss-boun...@gpfsug.org] On Behalf Of Luke Raimbach
Sent: 09 September 2014 11:24
To: gpfsug-discuss@gpfsug.org
Subject: [gpfsug-discuss] mmdiag output questions



Hi All,



When tracing a problem recently (which turned out to be a NIC failure), mmdiag 
proved useful in tracing broken cluster connections. I have some questions 
about the output of mmdiag using the --network switch:



Occasionally I see nodes in the same cluster grouped, but in no readily 
identifiable way - for example, the following output has three headings "Device 
bon0:" with some nodes listed, but the nodes don't seem to share anything in 
common like status, err, ostype, etc.



Also, is anyone able to explain what might be seen under the err column? Do 
these correspond to GPFS error codes as one might see in mmfs.log.latest? What 
is the sock column displaying - the number of open sockets or the socket state? 
Lastly, the sent/recvd columns seem very low. Is there a rolling time window 
within which these statistics are kept in some internal mmfsd buffer?



Cheers.



=== mmdiag: network ===



Pending messages:

  (none)

Inter-node communication configuration:

  tscTcpPort      1191

  my address      10.100.10.51/22 (eth0) <c0n8>

  my addr list    10.200.21.1/16 (bond0)/cpdn.oerc.local  10.100.10.51/22 (eth0)

  my node number  9

TCP Connections between nodes:

  Device bond0:

    hostname                            node     destination     status     err 
 sock  sent(MB)  recvd(MB)  ostype

    gpfs01                              <c0n0>   10.200.1.1      connected  0   
 32    110       110        Linux/L

    gpfs02                              <c0n1>   10.200.2.1      connected  0   
 36    104       104        Linux/L

    linux                               <c0n2>   10.200.101.1    connected  0   
 37    0         0          Linux/L

    jupiter                             <c0n3>   10.200.102.1    connected  0   
 35    0         0          Windows/L

    cnfs0                               <c0n4>   10.200.10.10    connected  0   
 39    0         0          Linux/L

    cnfs1                               <c0n5>   10.200.10.11    init       0   
 -1    0         0          Linux/L

  Device bond0:

    hostname                            node     destination     status     err 
 sock  sent(MB)  recvd(MB)  ostype

    cnfs2                               <c0n6>   10.200.10.12    connected  0   
 33    5         5          Linux/L

    cnfs3                               <c0n7>   10.200.10.13    init       0   
 -1    0         0          Linux/L

    cpdn-ppc02                          <c0n9>   10.200.61.1     init       0   
 -1    0         0          Linux/L

    cpdn-ppc03                          <c0n10>  10.200.62.1     init       0   
 -1    0         0          Linux/L

  Device bond0:

    hostname                            node     destination     status     err 
 sock  sent(MB)  recvd(MB)  ostype

    cpdn-ppc01                          <c0n11>  10.200.60.1     connected  0   
 38    0         0          Linux/L

diag verbs: VERBS RDMA class not initialized





Conversely, the output of mmdiag --network on the file system manager node for 
the same cluster looks like this:



=== mmdiag: network ===



Pending messages:

  (none)

Inter-node communication configuration:

  tscTcpPort      1191

  my address      10.100.10.21/22 (eth0) <c0n0>

  my addr list    10.200.1.1/16 (bond0)/cpdn.oerc.local  10.100.10.21/22 (eth0)

  my node number  1

TCP Connections between nodes:

  Device bond0:

    hostname                            node     destination     status     err 
 sock  sent(MB)  recvd(MB)  ostype

    gpfs02                              <c0n1>   10.200.2.1      connected  0   
 73    219       219        Linux/L

    linux                               <c0n2>   10.200.101.1    connected  0   
 49    180       181        Linux/L

    jupiter                             <c0n3>   10.200.102.1    connected  0   
 33    3         3          Windows/L

    cnfs0                               <c0n4>   10.200.10.10    connected  0   
 61    3         3          Linux/L

    cnfs1                               <c0n5>   10.200.10.11    connected  0   
 81    0         0          Linux/L

    cnfs2                               <c0n6>   10.200.10.12    connected  0   
 64    23        23         Linux/L

    cnfs3                               <c0n7>   10.200.10.13    connected  0   
 60    2         2          Linux/L

    tsm01                               <c0n8>   10.200.21.1     connected  0   
 50    110       110        Linux/L

    cpdn-ppc02                          <c0n9>   10.200.61.1     connected  0   
 63    0         0          Linux/L

    cpdn-ppc03                          <c0n10>  10.200.62.1     connected  0   
 65    0         0          Linux/L

    cpdn-ppc01                          <c0n11>  10.200.60.1     connected  0   
 62    94        94         Linux/L

diag verbs: VERBS RDMA class not initialized





All neatly connected!





--



Luke Raimbach

IT Manager

Oxford e-Research Centre

7 Keble Road,

Oxford,

OX1 3QG



+44(0)1865 610639

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss



-----

No virus found in this message.

Checked by AVG - www.avg.com<http://www.avg.com>

Version: 2014.0.4765 / Virus Database: 4015/8158 - Release Date: 09/05/14

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] mmdiag output questions

Reply via email to