Re: [Hdf-forum] Problems with the core driver and memory utilization

Quincey Koziol Thu, 15 Nov 2012 21:31:24 -0800

Hi Leigh,
        Sorry for the additional delay, I'm a little swamped with some 
contractual stuff and SC-related issues today.  I'll get something back to you 
tomorrow.


        Quincey

On Nov 9, 2012, at 11:36 AM, Leigh Orf <[email protected]> wrote:

> A major part of my I/O strategy for massively parallel supercomputers (such 
> as the new Blue Waters Cray XE6 machine) is doing buffered file writes. It 
> turns out that our cloud model only takes up a small fraction of the 
> available memory on a node, so we can buffer dozens of files to memory before 
> we have to hit the file system, dramatically improving I/O wallclock usage.
> 
> I am getting some strange behavior with the core driver, however. On some 
> machines and with some compilers, it works great. One problem that I am 
> having consistently on Blue Waters using the Cray compilers is that the 
> amount of memory being chewed up at every h5dwrite is way, way larger than 
> the actual size of the data arrays being written. Because I have limited 
> access to the machine right now, I have not tested it with other compilers.
> 
> Specific example of odd behavior:
> 
> First, here is how the data is stored in each file. The output below only 
> covers two time levels (there are many more in the file). Note: The group 
> 00000 is for time = 0 seconds, the group 00030 is for time = 30 seconds, etc.
> 
> h2ologin1:% h5ls -rv cm1out.00000_000000.cm1hdf5 | grep 3d
> 
> /00000/3d                Group
> /00000/3d/dbz            Dataset {250/250, 60/60, 60/60}
> /00000/3d/dissten        Dataset {250/250, 60/60, 60/60}
> /00000/3d/khh            Dataset {250/250, 60/60, 60/60}
> [...]
> /00020/3d                Group
> /00020/3d/dbz            Dataset {250/250, 60/60, 60/60}
> /00020/3d/dissten        Dataset {250/250, 60/60, 60/60}
> /00020/3d/khh            Dataset {250/250, 60/60, 60/60}
> [...]
> 
> and so on.
> 
> Data is gathered to one of the cores on the 16 core shared memory module so 
> only one core per module is buffering to memory and writing to disk.  Time 
> groups are created, data is written, groups are closed, new groups are 
> created, etc. This process goes on until I decide we've used up enough 
> memory, and I close the final groups and finally the file with a call to 
> h5fclose. Backing store is on, so when the file is closed, its contents are 
> flushed to disk. As I understand it, once this is done, all memory that the 
> file occupied in memory should be freed.
> 
> The problem: In a recent simulation, I wrote 41 3d fields per time level. 
> That should mean each time level should take up the following number of bytes:
> 
> 250*60*60*41*4 = 150 MB (roughly).
> 
> As part of my code, I query the /proc/meminfo (these machines run Linux) file 
> on each node to see how much memory is being used / is available, and output 
> the values after each buffer to memory. I keep track of what I call 
> global_free which is MemFree + Buffers + Cached, and do a MPI_REDUCE, picking 
> the smallest value (realizing there will be small variations in memory 
> available on each node. However,  the results would be nearly identical if I 
> just calculated this on any given node)
> 
> With no compression and no chunking, I see the following value of global_free 
> after each buffered write, which, remember, should be using up around 150 MB:
> 
>  0 global_free =  60268020
>  0 global_free =  57186776
>  0 global_free =  53716128
>  0 global_free =  51117500
>  0 global_free =  48013960
>  0 global_free =  44306108
> 
> etc. etc.
> 
> Those values are in kB - so, for instance, we went from 60.2 GB to 57.1 GB 
> (chewed up about 3GB) after writing 150 MB of data!
> 
> I do not see this behavior on all machines, and I'm not sure it's a hdf5 bug 
> (could be a Cray bug ... and we have submitted a bug report with Cray). But, 
> because I have seen flakiness with the core driver beyond this example, and 
> there is precious little documentation on it, I wanted to ask whether anyone 
> had any ideas on how to troubleshoot this problem. Note, this is with version 
> 1.8.8, which is the latest version installed on the Blue Waters machine.
> 
> Note that once the file is flushed to disk, its size is exactly what it 
> should be based upon the size of the arrays and the data is exactly what it 
> should be.
> 
> Finally, when I comment out only the h5dwrite command in the 3D write 
> subroutine, and leave everything else the same, memory is essentially flat, 
> meaning it's not a memory leak on my part. I've experimented with and without 
> chunking, and with and without compression. Turning gzip compression on (with 
> chunking of course) seems to take up a little less memory per buffered write, 
> but still way more than it should.
> 
> Here is how I am initializing the files:
> 
> backing_store = .true.
> blocksize = 4096
> call h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, ierror); check_err(ierror)
> call h5pset_fapl_core_f(plist_id, blocksize, backing_store, ierror); 
> check_err(ierror)    
> call 
> h5fcreate_f(trim(filename),H5F_ACC_TRUNC_F,file_3d_id,ierror,access_prp=plist_id);
>  check_err(ierror)
> call h5pclose_f(plist_id, ierror); check_err(ierror)
> 
> I am not calling h5p_set_alignment and cannot recall why I chose 4096 bytes 
> for a memory increment size.
> 
> Thanks for any pointers.
> 
> Leigh
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Problems with the core driver and memory utilization

Reply via email to