It is with humble apology and great relief that I was wrong about the AFM 
limitation that I believed existed in the configuration I explained below.



The problem that I had with my configuration is that the NSD client cluster was 
not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 
3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system 
release version (e.g. mmchconfig release=LATEST) to 4.1.0-3.  This GPFS 
configuration “requirement” isn’t documented in the Advanced Admin Guide, but 
it makes sense that this is required since only the GPFS 4.1 release supports 
the GPFS protocol for AFM fileset targets.



I have tested the configuration with a new NSD Client cluster and the 
configuration works as desired.



Thanks Kalyan and others for their feedback.  Our file system namespace is 
unfortunately filled with small files that do not allow AFM to parallelize the 
data transfers across multiple nodes.  And unfortunately AFM will only allow 
one Gateway node per fileset to perform the prefetch namespace scan operation, 
which is incredibly slow as I stated before.  We were only seeing roughly 100 x 
" Queue numExec" operations per second.  I think this performance is gated by 
the directory namespace scan of the single gateway node.



Thanks!

-Bryan



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 10:21 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, 
slow prefetch operations



some clarifications inline:



Regards

Kalyan

GPFS Development

EGL D Block, Bangalore









From:    Bryan Banister 
<[email protected]<mailto:[email protected]>>

To:          gpfsug main discussion list 
<[email protected]<mailto:[email protected]>>

Date:     10/07/2014 08:12 PM

Subject:               Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:               
[email protected]<mailto:[email protected]>







Interesting that AFM is supposed to work in a multi-cluster environment.

We were using GPFS on the backend.  The new GPFS file system was AFM linked 
over GPFS protocol to the old GPFS file system using the standard

multi-cluster mount.   The "gateway" nodes in the new cluster mounted the

old file system.  All systems were connected over the same QDR IB fabric.

The client compute nodes in the third cluster mounted both the old and new file 
systems.  I looked for waiters on the client and NSD servers of the new file 
system when the problem occurred, but none existed.  I tried stracing the `ls` 
process, but it reported nothing and the strace itself become unkillable.  
There were no error messages in any GPFS or system logs related to the `ls` 
fail.  NFS clients accessing cNFS servers in the new cluster also worked as 
expected.  The `ls` from the NFS client in an AFM fileset returned the expected 
directory listing.  Thus all symptoms indicated the configuration wasn't 
supported.  I may try to replicate the problem in a test environment at some 
point.



However AFM isn't really a great solution for file data migration between file 
systems for these reasons:

1) It requires the complicated AFM setup, which requires manual operations to 
sync data between the file systems (e.g. mmapplypolicy run on old file system 
to get file list THEN mmafmctl prefetch operation on the new AFM fileset to 
pull data).  No way to have it simply keep the two namespaces in sync.  And you 
must be careful with the "Local Update" configuration not to modify basically 
ANY file attributes in the new AFM fileset until a CLEAN cutover of your 
application is performed, otherwise AFM will remove the link of the file to 
data stored on the old file system.  This is concerning and it is not easy to 
detect that this event has occurred.



--> The LU mode is meant for scenarios where changes in cache are not

--> meant

to be pushed back to old filesystem.  If thats not whats desired then other AFM 
modes like IW can be used to keep namespace in sync and data can flow from both 
sides.  Typically, for data migration --metadata-only to pull in the full 
namespace first and data can be migrated on demand or via policy as outlined 
above using prefetch cmd.  AFM setup should be extension to GPFS multi-cluster 
setup when using GPFS backend.



2) The "Progressive migration with no downtime" directions actually states that 
there is downtime required to move applications to the new cluster, THUS 
DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on 
the file set so that there is no longer a connection to the old file system, 
THUS TWO DOWNTIMES!

--> I am not sure I follow the first downtime.  If applications have to

start using the new filesystem, then they have to be informed accordingly.

If this can be done without bringing down applications, then there is no 
DOWNTIME.

Regarding, second downtime, you are right, disabling AFM after data migration 
requires unlink and hence downtime.  But there is a easy workaround, where 
revalidation intervals can be increased to max or GW nodes can be unconfigured 
without downtime with same effect.  And disabling AFM can be done at a later 
point during maintenance window.  We plan to modify this to have this done 
online aka without requiring unlink of the fileset.  This will get prioritized 
if there is enough interest in AFM being used in this direction.



3) The prefetch operation can only run on a single node thus is not able to 
take any advantage of the large number of NSD servers supporting both file 
systems for the data migration.  Multiple threads from a single node just 
doesn't cut it due to single node bandwidth limits.  When I was running the 
prefetch it was only executing roughly 100 " Queue numExec" operations per 
second.  The prefetch operation for a directory with 12 Million files was going 
to take over 33 HOURS just to process the file list!

--> Prefetch can run on multiple nodes by configuring multiple GW nodes

--> and

enabling parallel i/o as specified in the docs..link provided below.

Infact it can parallelize data xfer to a single file and also do multiple files 
in parallel depending on filesizes and various tuning params.



4) In comparison, parallel rsync operations will require only ONE downtime to 
run a final sync over MULTIPLE nodes in parallel at the time that applications 
are migrated between file systems and does not require the complicated AFM 
configuration.  Yes, there is of course efforts to breakup the namespace for 
each rsync operations.  This is really what AFM should be doing for us... 
chopping up the namespace intelligently and spawning prefetch operations across 
multiple nodes in a configurable way to ensure performance is met or limiting 
overall impact of the operation if desired.



--> AFM can be used for data migration without any downtime dictated by

--> AFM

(see above) and it can infact use multiple threads on multiple nodes to do 
parallel i/o.



AFM, however, is great for what it is intended to be, a cached data access 
mechanism across a WAN.



Thanks,

-Bryan



-----Original Message-----

From: 
[email protected]<mailto:[email protected]> [ 
mailto:[email protected]] On Behalf Of Kalyan Gunda

Sent: Tuesday, October 07, 2014 12:03 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, 
slow prefetch operations



Hi Bryan,

AFM supports GPFS multi-cluster..and we have customers already using this 
successfully.  Are you using GPFS backend?

Can you explain your configuration in detail and if ls is hung it would have 
generated some long waiters.  Maybe this should be pursued separately via PMR.  
You can ping me the details directly if needed along with opening a PMR per IBM 
service process.



As for as prefetch is concerned, right now its limited to  one prefetch job per 
fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull 
in data based on configuration.

"afmNumFlushThreads" tunable controls the number of threads used by AFM.

This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't 
show this param for some reason, I will have that updated.)



eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW 
changed.



List the change:

mmlsfileset fs1 prefetchIW --afm -L

Filesets in file system 'fs1':



Attributes for fileset prefetchIW:

===================================

Status                                  Linked

Path                                    /gpfs/fs1/prefetchIW

Id                                      36

afm-associated                          Yes

Target

nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch

Mode                                    independent-writer

File Lookup Refresh Interval            30 (default)

File Open Refresh Interval              30 (default)

Dir Lookup Refresh Interval             60 (default)

Dir Open Refresh Interval               60 (default)

Async Delay                             15 (default)

Last pSnapId                            0

Display Home Snapshots                  no

Number of Gateway Flush Threads         5

Prefetch Threshold                      0 (default)

Eviction Enabled                        yes (default)



AFM parallel i/o can be setup such that multiple GW nodes can be used to pull 
in data..more details are available here 
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm





and this link outlines tuning params for parallel i/o along with others:

http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning





Regards

Kalyan

GPFS Development

EGL D Block, Bangalore









From:   Bryan Banister 
<[email protected]<mailto:[email protected]>>

To:     gpfsug main discussion list 
<[email protected]<mailto:[email protected]>>

Date:   10/06/2014 09:57 PM

Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:        
[email protected]<mailto:[email protected]>







We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan



From: 
[email protected]<mailto:[email protected]> [ 
mailto:[email protected]] On Behalf Of Sven Oehme

Sent: Monday, October 06, 2014 11:28 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, 
slow prefetch operations



Hi Bryan,



in 4.1 AFM uses multiple threads for reading data, this was different in

3.5 . what version are you using ?



thx. Sven





On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister 
<[email protected]<mailto:[email protected]>>

wrote:

Just an FYI to the GPFS user community,



We have been testing out GPFS AFM file systems in our required process of file 
data migration between two GPFS file systems.  The two GPFS file systems are 
managed in two separate GPFS clusters.  We have a third GPFS cluster for 
compute systems.  We created new independent AFM filesets in the new GPFS file 
system that are linked to directories in the old file system.  Unfortunately 
access to the AFM filesets from the compute cluster completely hang.  Access to 
the other parts of the second file system is fine.  This limitation/issue is 
not documented in the Advanced Admin Guide.



Further, we performed prefetch operations using a file mmafmctl command, but 
the process appears to be single threaded and the operation was extremely slow 
as a result.  According to the Advanced Admin Guide, it is not possible to run 
multiple prefetch jobs on the same fileset:

GPFS can prefetch the data using the mmafmctl Device prefetch –j FilesetName 
command (which specifies a list of files to prefetch). Note the following about 
prefetching:

v It can be run in parallel on multiple filesets (although more than one 
prefetching job cannot be run in parallel on a single fileset).



We were able to quickly create the “--home-inode-file” from the old file system 
using the mmapplypolicy command as the documentation describes.

However the AFM prefetch operation is so slow that we are better off running 
parallel rsync operations between the file systems versus using the GPFS AFM 
prefetch operation.



Cheers,

-Bryan









Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.



_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss











Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss





_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss



________________________________



Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to