Hi, Jaime, I'd suggest you trace a client while trying to connect and check what addresses it is going to talk to actually. It is a bit tedious, but you will be able to find this in the trace report file. You might also get an idea what's going wrong...
Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: [email protected] ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" <[email protected]> To: "gpfsug main discussion list" <[email protected]> Date: 05/08/2017 06:06 PM Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable Sent by: [email protected] We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 <c4p1> 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 <c4p2> 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 <c4p3> 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 <c4p4> 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 <c4p5> 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
