Rob,

I found your explanation very helpful, at least to me.
Are there documents listing all recognized hints by IBM and/or ROMIO?
Also, can the xxx_size hints recognize something like “40MB” instead of 
“41943040”?
(Of course, there is this ambiguity whether MB means 2^20 or 10^6.)

-Albert Cheng

On Nov 13, 2014, at 9:25 AM, Rob Latham <[email protected]> wrote:

> 
> 
> On 11/13/2014 06:34 AM, Angel de Vicente wrote:
> 
>> 
>> thanks. I'm not sure if these could be tuned a bit better, but with the
>> following hints the problem is all gone in the two problematic clusters
>> (for a given file size, one of the writing modes of the program was
>> taking about ~200x more time. With these hints all is back to normal,
>> and the problematic mode takes just the same time as the other ones).
>> 
> 
> You can pass anything you want for the "key": implementations will ignore 
> hints they do not understand.   For the sake of anyone googling in the 
> future, I will explain what, if anything, the hints you passed in do:
> 
> 
>> call MPI_Info_create(info, error)
>> call MPI_Info_set(info,"IBM_largeblock_io","true", error)
> 
> this hint is useful for IBM PE platforms and tells GPFS you are about to do 
> large I/O.  Over time, this hint will become less useful: IBM is moving away 
> from their own MPI-IO implementation and incorporating ROMIO.
> 
>> call MPI_Info_set(info,"stripping_unit","4194304", error)
> 
> this one is probably the biggest help.  In Collective I/O, ROMIO splits up 
> the file into "file domains" (and assigns those domains to a subset of 
> processors called I/O aggregators).  When the "striping_unit" hint is set, 
> ROMIO will align those file domains to that striping_unit.
> 
> Sometimes, like on Blue Gene, ROMIO will detect the file system block size 
> for you, and this hint is not needed.  No harm in providing it, though.
> 
> 
>> CALL 
>> MPI_INFO_SET(info,"H5F_ACS_CORE_WRITE_TRACKING_PAGE_SIZE_DEF","524288",error)
> 
> I don't think this hint does anything.
> 
>> CALL MPI_INFO_SET(info,"ind_rd_buffer_size","41943040", error)
>> CALL MPI_INFO_SET(info,"ind_wr_buffer_size","5242880", error)
>> CALL MPI_INFO_SET(info,"romio_ds_read","disable", error)
>> CALL MPI_INFO_SET(info,"romio_ds_write","disable", error)
> 
> No harm here, but if you are going to disable data sieving (romio_ds_read and 
> romio_ds_write) then there's no reason to tweak the independent read and 
> write buffer sizes.
> 
>> CALL MPI_INFO_SET(info,"romio_cb_write","enable", error)
> 
> On many platforms (but not Blue Gene), romio will look at the access pattern. 
>  If the pattern is not interleaved, ROMIO will not use collective buffering.  
> At today's scale, collective buffering is almost always a win, especially on 
> GPFS when combined with striping_unit.
> 
>> CALL MPI_INFO_SET(info,"cb_buffer_size","4194304", error)
> 
> this buffer size might actually be a bit small, depending on how much data 
> you are writing/reading.  If you have memory to spare, increasing this value 
> is often a good way to improve performance.
> 
>> For the moment, problem solved. Thanks a lot,
> 
> tuning these stacks honestly way harder than it should be. thanks for your 
> persistence.
> 
> ==rob
> 
> -- 
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to