On Mon, Aug 26, 2013 at 06:15:30AM +0000, Biddiscombe, John A. wrote:
> Rob,
> 
> Did you make any significant discoveries/progress regarding the GPFS tweaks 
> on BG systems. Our machine will be open for use within the next week or so 
> and I'd like to begin some profiling. I'd be interested in knowing if you 
> have discovered any useful facts that I ought to know about.

An upcoming driver update (I don't know which one) will allow the Blue
Gene compute nodes to send the gpfs_fcntl commands all the way through
to the GPFS file system (presently the gpfs_fcntl commands return "not
supported".  Then, we can do some experiments to see if they still
provide any benefit at Blue Gene scales (the optimizations are 15
years old at this point, designed when "massively parallel system" was
32 nodes.

More generally, I've found that some of the default MPI-IO settings
are probably not ideal for /Q, and have tested/suggested a change to
the "number of I/O aggregators" defaults.

Meanwhile, ALCF (the folks who operate the machine) have been working
with IBM to improve the state of collective I/O.  Seems like we're
making some progress there as well.

> I'm concerned about how much the --enable-gpfs option is able to
> 'know' about the system (can we easily find out what the option
> does?). According to my superficial understanding of the BG
> architecture, it seems that since the compute nodes have IO calls
> forwarded off to the IO nodes by kernel level routines, collective
> operations performed by hdf5 might actually reduce the effectiveness
> of the IO by forcing the data to be shuffled around twice instead of
> once. Am I thinking along the right lines?

The --enable-gpfs option will attempt to do a few things:

gpfs_access_range
gpfs_free_range

This is the "multiple access range" hint, which tells GPFS "hey, don't
grab a lock on the whole file.  instead, just these sections".  I
*think* this is going to be one of the better improvements remaining.

gpfs_clear_file_cache
gpfs_invalidate_file_cache

Good for benchmarking.  Ejects all entries from the gpfs page pool.

gpfs_cancel_hints

just resets things

gpfs_start_data_shipping
gpfs_start_data_ship_map
gpfs_stop_data_shipping

Unfortunately, GPFS-3.5 does not support data shipping any longer.

I still think these hints need to be implemented in the MPI-IO
library, if they still help at all, but if one is being pragmatic one
might more easily deploy the hints through HDF5.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to