Re: [Hdf-forum] HDF5 causes Fatal error in MPI_Gather

Frijters, S.C.J. Thu, 25 Mar 2010 03:09:59 -0700

Hi Quincey,

I managed to increase the chunk size - I overlooked the fact that my blocks of 
data weren't cubes in my  testcase. However, it seems that performance can 
suffer a lot for certain chunk sizes (for my test case):


The size of my entire data array is 40 x 40 x 160. My MPI cartesian grid is 4 x 
4 x 1, so every core has a 10 x 10 x 160 subset. Originally I had the chunk 
size set to 10 x 10 x 160 as well (which explains why I couldn't double the 3rd 
component), and writes take less than a second. However, if I set the chunk 
size to 20 x 20 x 160, it's really slow (7 seconds), while 40 x 40 x 160 once 
again takes less than a second. I'd been reading up on the whole chunking thing 
before, but I  think I'm still ignorant of some of the subtleties. Am I 
violating a rule here so that HDF goes back to independent IO?

Is there some rule of thumb, or set of guidelines to get good performance out 
of it? I read your "Parallel HDF5 Hints"  document and some others, but it 
hasn't helped me enough, apparently :-D.  The time spent on IO in my 
application is getting to be somewhat of a hot item.

Thanks again for the continued support,

Stefan Frijters
________________________________________
From: [email protected] [[email protected]] On Behalf 
Of Quincey Koziol [[email protected]]
Sent: 24 March 2010 17:12
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 causes Fatal error in MPI_Gather

Hi Stefan,

On Mar 24, 2010, at 11:06 AM, Frijters, S.C.J. wrote:

> Hi Quincey,
>
> I can double one dimension on my chunk size (at the cost of really slow IO), 
> but if I double them all I get errors like these:
>
> HDF5-DIAG: Error detected in HDF5 (1.8.4) MPI-process 4:
>  #000: H5D.c line 171 in H5Dcreate2(): unable to create dataset
>    major: Dataset
>    minor: Unable to initialize object
>  #001: H5Dint.c line 428 in H5D_create_named(): unable to create and link to 
> dataset
>    major: Dataset
>    minor: Unable to initialize object
>  #002: H5L.c line 1639 in H5L_link_object(): unable to create new link to 
> object
>    major: Links
>    minor: Unable to initialize object
>  #003: H5L.c line 1862 in H5L_create_real(): can't insert link
>    major: Symbol table
>    minor: Unable to insert object
>  #004: H5Gtraverse.c line 877 in H5G_traverse(): internal path traversal 
> failed
>    major: Symbol table
>    minor: Object not found
>  #005: H5Gtraverse.c line 703 in H5G_traverse_real(): traversal operator 
> failed
>    major: Symbol table
>    minor: Callback failed
>  #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
>    major: Object header
>    minor: Unable to initialize object
>  #007: H5O.c line 2677 in H5O_obj_create(): unable to open object
>    major: Object header
>    minor: Can't open object
>  #008: H5Doh.c line 296 in H5O_dset_create(): unable to create dataset
>    major: Dataset
>    minor: Unable to initialize object
>  #009: H5Dint.c line 1030 in H5D_create(): unable to construct layout 
> information
>    major: Dataset
>    minor: Unable to initialize object
>  #010: H5Dchunk.c line 420 in H5D_chunk_construct(): chunk size must be <= 
> maximum dimension size for fixed-sized dimensions
>    major: Dataset
>    minor: Unable to initialize object
>
> I am currently doing test runs on my local machine on 16 cores because the 
> large machine I run jobs on is unavailable at the moment and has a queueing 
> system rather unsuited to quick test runs, so maybe this is an artefact of 
> running on such a small number of cores? Although I *think* I tried this 
> before and got the same type of error on several thousand cores also.

        You seem to have increased the chunk dimension to be larger than the 
dataset dimension.  What is the chunk size and dataspace size you are using?

        Quincey

>
> Kind regards,
>
> Stefan Frijters
>
> ________________________________________
> From: [email protected] [[email protected]] On 
> Behalf Of Quincey Koziol [[email protected]]
> Sent: 24 March 2010 16:28
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] HDF5 causes Fatal error in MPI_Gather
>
> Hi Stefan,
>
> On Mar 24, 2010, at 10:10 AM, Stefan Frijters wrote:
>
>> Hi Quincey,
>>
>> Thanks for the quick response. Currently, each core is handling its
>> datasets with a chunk size equal to the size of the local data (the dims
>> parameter in h5pset_chunk_f is equal to to dims parameter in
>> h5dwrite_f)  because the local arrays are not that large anyway (in the
>> order of 20x20x20 reals), so if I understand things correctly I'm
>> already using maximum chunk size.
>
>        No, you don't have to make them the same size, since the collective 
> I/O should stitch them back together anyway.  Try doubling the dimensions on 
> your chunks.
>
>> Do you have an idea why it doesn't crash the first time I try to do it
>> though? It's a different array, but of the same size and datatype as the
>> second. As far as I can see I'm closing all used handles at the end of
>> my function at least.
>
>        Hmm, I'm not certain...
>
>                Quincey
>
>> Kind regards,
>>
>> Stefan Frijters
>>
>>> Hi Stefan,
>>>
>>> On Mar 24, 2010, at 3:11 AM, Stefan Frijters wrote:
>>>
>>>> Dear all,
>>>>
>>>> Recently, I've run into a problem with my parallel HDF5 writes. My
>>>> program works fine on 8k cores, but when I run it on 16k cores it
>>>> crashes when writing a data file through h5dwrite_f(...).
>>>> File writes go through one function in the code, so it always uses the
>>>> same code, but for some reason I don't understand it writes one file
>>>> without problems, but the second one throws the following error message:
>>>>
>>>> Abort(1) on node 0 (rank 0 in comm 1140850688): Fatal error in
>>>> MPI_Gather: Invalid buffer pointer, error stack:
>>>> MPI_Gather(758): MPI_Gather(sbuf=0xa356f400, scount=16000, MPI_BYTE,
>>>> rbuf=(nil), rcount=16000, MPI_BYTE, root=0, comm=0x84000003) failed
>>>> MPI_Gather(675): Null buffer pointer
>>>>
>>>> I've been looking through the HDF5 source code and it only seems to call
>>>> MPI_Gather in one place, in the function H5D_obtain_mpio_mode. In that
>>>> function HDF tries to allocate a receive buffer using
>>>>
>>>> recv_io_mode_info = (uint8_t *)H5MM_malloc(total_chunks * mpi_size);
>>>>
>>>> Which then returns the null pointer seen in rbuf=(nil) instead of a
>>>> valid pointer. Thus, to me it seems it's HDF causing the problem and not
>>>> MPI.
>>>>
>>>> This problem occurs in both collective and independent IO mode.
>>>>
>>>> Do you have any idea what might be causing this problem, or how to
>>>> resolve it? I'm not sure what kind of other information you might need,
>>>> but I'll do my best to supply it, if you need any.
>>>
>>> This is a scalability problem we are aware of and are working to address,
>>> but in the meanwhile, can you increase the size of your chunks for your
>>> dataset(s)?  (That will reduce the number of chunks and the size of the
>>> buffer being allocated)
>>>
>>>     Quincey
>>>
>>
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] HDF5 causes Fatal error in MPI_Gather

Reply via email to