As primary developer of mmapplypolicy, please allow me to comment: 1) Fast access to metadata in system pool is most important, as several have commented on. These days SSD is the favorite, but you can still go with "spinning" media. If you do go with disks, it's extremely important to spread your metadata over independent disk "arms" -- so you can have many concurrent seeks in progress at the same time. IOW, if there is a virtualization/mapping layer, watchout that your logical disks don't get mapped to the same physical disk.
2) Crucial to use both -g and -N :: -g /gpfs-not-necessarily-the-same-fs-as-Im-scanning/tempdir and -N several-nodes-that-will-be-accessing-the-system-pool 3a) If at all possible, encourage your data and application designers to "pack" their directories with lots of files. Keep in mind that, mmapplypolicy will read every directory. The more directories, the more seeks, more time spent waiting for IO. OTOH, in more typical Unix/Linux usage, we tend to low average number of files per directory. 3b) As admin, you may not be able to change your data design to pack hundreds of files per directory, BUT you can make sure you are running a sufficiently modern release of Spectrum Scale that supports "data in inode" -- "Data in inode" also means "directory entries in inode" -- which means practically any small directory, up to a few hundred files, will fit in an an inode -- which means mmapplypolicy can read small directories with one seek, instead of two. (Someone will please remind us of the release number that first supported "directories in inode".) 4) Sorry, Fred, but the recommendation to use RAID mirroring of metadata on SSD, is not necessarily, important for metadata scanning. In fact it may work against you. If you use GPFS replication of metadata - that can work for you -- since then GPFS can direct read operations to either copy, preferring a locally attached copy, depending on how storage is attached to node, etc, etc. Choice of how to replicate metadata - either using GPFS replication or the RAID controller - is probably best made based on reliability and recoverability requirements. 5) YMMV - We'd love to hear/see your performance results for mmapplypolicy, especially if they're good. Even if they're bad, come back here for more tuning tips! -- marc of Spectrum Scale (ne GPFS)
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
