On 21/10/2016 20:46, Bill Pappas wrote:
[...]

If you are using GPFS as the conduit between the home and cache
(i.e.  no NFS), I would still ask the same question, more with respect to
stability for large file lists during the initial prefetch stages.

Hello,
I'm in the final stage of what the AFM documentation calls an "incremental migration", for a filesystem with 100 million files. (GPFS 4.1.1, single cluster migration, "hardware/filesystem refresh" use case)

I initially tried to use the NFS transport but found it too unreliable (and, in my opinion, very poorly documented). As I was about to give up on AFM, I tried using the GPFS transport (after seeing a trivially simple example on slides by someone from ANL) and things just started to work (almost) as I expected.

For the files lists, I use data produced for our monitoring system that relies on snapshots fast scans (we do daily statistics on all objects in our GPFS filesystems). Our data gathering tool encodes object names in the RFC3986 (URL encoding) format which is what I found "mmafmctl prefetch" expects for "special" filenames. I understand that the policy engine does this too which, I guess, is what the documentation means by "generate a file list using policy" (sic), yet "mmafmctl prefetch" does not seem to accept/like files produced by a simple "LIST" policy (and the documentation lacks an example).

As you did, I found that trying to prefetch large lists of files does not work reliably. I remember reading on that list someone (from IBM Germany, I think) recommanding to limit the number of a files in a single prefetch to 2 millions. This appears to be the sweet spot for my needs, as I can split the files list in 2 millions parts (the largest fileset in the "home" filesystem has 26 million files) and at the same time manage the issues I mention below.

To keep up with the updates on the "home" filesystem (modified files), I rely on the "gpfs_winflags" GPFS extended attribute (the GPFS_WINATTR_OFFLINE bit is on for modified files, see "mmlsattr -L /cachefs/objectname" output). By chance, this attribute is included in the files produced for our statistics. This allows us to avoid doing a prefetch of all the files "continuously", since the file scan indeed appears to use only the (single) gateway node for the fileset being prefetched.

In my specific configuration/environment, there are still several issues:
. There is a significant memory and "slab" leak on the gateway nodes
  which can easily lead to a completely unreachable gateway node.
  These leaks appear directly related to the number of files
  prefetched. Stoping GPFS on the gateway node only releases some of
  the memory but none of the "slab", which requires a node reboot.
. There is also a need to increase the "fs.file-max" sysctl on the
  gateway nodes to a value larger than the default (I use 10M), to
  avoid the kernel running out of file descriptors, since this leads to
  node being unreachable too.
. Sometimes, an AFM association will go to the "Unmounted" state (for
  no apparent reason). The only reliable way to bring it back to
  "Active" state I found is to : unmount the "cache" filesystem from
  all nodes mouting it, unmounting/remounting the "home" filesystem on
  the gateway nodes, then remounting the "cache" filesystem where it is
  needed (gateway nodes, etc.) and finally using "ls -la" in the
  "cache" filesystem to bring the AFM associations back into the Active
  state.  As I'm doing an "incremental migration", the "home" fileystem
  is still actively used and unmounting it on all nodes (as suggested
  by the documentation) is not an option.
. I include directories and symlinks in the file lists for the
  prefetch. This ensures symlinks targets are there without needing a
  "ls" or "stat" in the "cache" filesystem ("incremental migration").
. The only reliable way I found to have "mmafmctl prefetch" accept
  files lists is to use the "--home-list-file" & "--home-fs-path"
  options.

In my experience, in my environment, using AFM for migrations requires a significant amount of work and hand-holding to get a useful result. Since this migration is actually only the first step in a extensive multiple filesystems migration/merge plan, I'm pondering wether I'll use AFM for the rest of the operations.


Sorry, this mail is too long,
Loïc.
--
|     Loïc Tortay <[email protected]>  -  IN2P3 Computing Centre      |

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to