Hi all.

After some help from IBM, we’ve concluded (and been told) that AFM over the NSD 
protocol when latency is greater than around 50ms on the RTT is effectively 
unusable. We’ve proven that now, so it is time to move on from the NSD protocol 
being an effective option in those conditions (unless IBM can consider it 
something worthy of an RFE and can fix it!).

The problem we face now, is one of parallelism and filling that 
10GbE/40GbE/100GbE pipe efficiently, when using NFS as the transport provider 
for AFM.

On my test cluster at “Cache” side I’ve got two or three gateways:

[root@mc-5 ~]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         sdx-gpfs.xxxxxxxxxxxxxxxx
  GPFS cluster id:           12880500218013865782
  GPFS UID domain:           sdx-gpfs. xxxxxxxxxxxxxxxx
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name           IP address    Admin node name            
Designation
---------------------------------------------------------------------------------------
   1   mc-5. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-5.hidden.net  
quorum-manager
   2   mc-6. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-6. hidden.net  
quorum-manager-gateway
   3   mc-7. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-7. hidden.net  
quorum-manager-gateway
   4   mc-8. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-8. hidden.net  
quorum-manager-gateway

The bit I really don’t get is:


1.       Why no traffic ever seems to go through mc-6 or mc-8 back to my “home” 
directly and

2.       Why it only ever lists my AFM-cache fileset being associated with one 
gateway (mc-7).

I can see traffic flowing through mc-6 sometimes…but when it does, it all seems 
to channel back through mc-7 THEN back to the AFM-home. Am I missing something?

This is where I see one of the gateway’s listed (but never the others?).

[root@mc-5 ~]# mmafmctl afmcachefs getstate
Fileset Name    Fileset Target                                Cache State       
   Gateway Node    Queue Length   Queue numExec
------------    --------------                                -------------     
   ------------    ------------   -------------
afm-home        nfs://omnipath2/gpfs-flash/afm-home           Active            
   mc-7            0              746636

I got told I needed to setup “explicit maps” back to my home cluster to achieve 
parallelism:

[root@mc-5 ~]# mmafmconfig show
Map name:             omnipath1
Export server map:    address.is.hidden.100/mc-6.ip.address.hidden

Map name:             omnipath2
Export server map:    address.is.hidden.101/mc-7.ip.address.hidden

But – I have never seen any traffic come back from mc-6 to omnipath1.

What am I missing, and how do I actually achieve significant enough parallelism 
over an NFS transport to fill my 10GbE pipe?

I’ve seen maybe a couple of gigabits per second from the mc-7 host writing back 
to the omnipath2 host – and that was really trying my level best to put as many 
files onto the afm-cache at this side and hoping that enough threads pick up 
enough different files to start transferring files down the AFM simultaneously 
– but what I’d really like is those large files (or small, up to the thresholds 
set) to break into parallel chunks and ALL transfer as fast as possible, 
utilising as much of the 10GbE as they can.

Maybe I am missing fundamental principles in the way AFM works?

Thanks.

-jc

PS: NB The link is easily capable of 10GbE. We’ve tested it all the way up to 
about 9.67Gbit/sec transferring data from these sets of hosts using other 
protocols such as fDT and Globus Grid FTP Et al.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to