Hi Stefan,
On Mar 26, 2010, at 9:25 AM, Frijters, S.C.J. wrote:
> Hi Quincey,
>
> I'm using H5T_NATIVE_DOUBLE to write an array of real*8 and H5T_NATIVE_REAL
> for real*4. Is that okay?
That will work, but it will cause the I/O to be independent, rather
than collective (if you request collective). Try writing to the file in the
same datatype you use for your memory datatype and see if the performance is
better.
Quincey
> Kind regards,
>
> Stefan Frijters
> ________________________________________
> From: [email protected] [[email protected]] On
> Behalf Of Quincey Koziol [[email protected]]
> Sent: 25 March 2010 22:40
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] HDF5 causes Fatal error in MPI_Gather
>
> Hi Stefan,
>
> On Mar 25, 2010, at 5:08 AM, Frijters, S.C.J. wrote:
>
>> Hi Quincey,
>>
>> I managed to increase the chunk size - I overlooked the fact that my blocks
>> of data weren't cubes in my testcase. However, it seems that performance
>> can suffer a lot for certain chunk sizes (for my test case):
>>
>> The size of my entire data array is 40 x 40 x 160. My MPI cartesian grid is
>> 4 x 4 x 1, so every core has a 10 x 10 x 160 subset. Originally I had the
>> chunk size set to 10 x 10 x 160 as well (which explains why I couldn't
>> double the 3rd component), and writes take less than a second. However, if I
>> set the chunk size to 20 x 20 x 160, it's really slow (7 seconds), while 40
>> x 40 x 160 once again takes less than a second. I'd been reading up on the
>> whole chunking thing before, but I think I'm still ignorant of some of the
>> subtleties. Am I violating a rule here so that HDF goes back to independent
>> IO?
>
> Hmm, are your datatypes the same in memory and the file? If they
> aren't, HDF5 will break collective I/O down into independent I/O.
>
>> Is there some rule of thumb, or set of guidelines to get good performance
>> out of it? I read your "Parallel HDF5 Hints" document and some others, but
>> it hasn't helped me enough, apparently :-D. The time spent on IO in my
>> application is getting to be somewhat of a hot item.
>
> That would be where I'd point you.
>
> Quincey
>
>> Thanks again for the continued support,
>>
>> Stefan Frijters
>> ________________________________________
>> From: [email protected] [[email protected]] On
>> Behalf Of Quincey Koziol [[email protected]]
>> Sent: 24 March 2010 17:12
>> To: HDF Users Discussion List
>> Subject: Re: [Hdf-forum] HDF5 causes Fatal error in MPI_Gather
>>
>> Hi Stefan,
>>
>> On Mar 24, 2010, at 11:06 AM, Frijters, S.C.J. wrote:
>>
>>> Hi Quincey,
>>>
>>> I can double one dimension on my chunk size (at the cost of really slow
>>> IO), but if I double them all I get errors like these:
>>>
>>> HDF5-DIAG: Error detected in HDF5 (1.8.4) MPI-process 4:
>>> #000: H5D.c line 171 in H5Dcreate2(): unable to create dataset
>>> major: Dataset
>>> minor: Unable to initialize object
>>> #001: H5Dint.c line 428 in H5D_create_named(): unable to create and link to
>>> dataset
>>> major: Dataset
>>> minor: Unable to initialize object
>>> #002: H5L.c line 1639 in H5L_link_object(): unable to create new link to
>>> object
>>> major: Links
>>> minor: Unable to initialize object
>>> #003: H5L.c line 1862 in H5L_create_real(): can't insert link
>>> major: Symbol table
>>> minor: Unable to insert object
>>> #004: H5Gtraverse.c line 877 in H5G_traverse(): internal path traversal
>>> failed
>>> major: Symbol table
>>> minor: Object not found
>>> #005: H5Gtraverse.c line 703 in H5G_traverse_real(): traversal operator
>>> failed
>>> major: Symbol table
>>> minor: Callback failed
>>> #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
>>> major: Object header
>>> minor: Unable to initialize object
>>> #007: H5O.c line 2677 in H5O_obj_create(): unable to open object
>>> major: Object header
>>> minor: Can't open object
>>> #008: H5Doh.c line 296 in H5O_dset_create(): unable to create dataset
>>> major: Dataset
>>> minor: Unable to initialize object
>>> #009: H5Dint.c line 1030 in H5D_create(): unable to construct layout
>>> information
>>> major: Dataset
>>> minor: Unable to initialize object
>>> #010: H5Dchunk.c line 420 in H5D_chunk_construct(): chunk size must be <=
>>> maximum dimension size for fixed-sized dimensions
>>> major: Dataset
>>> minor: Unable to initialize object
>>>
>>> I am currently doing test runs on my local machine on 16 cores because the
>>> large machine I run jobs on is unavailable at the moment and has a queueing
>>> system rather unsuited to quick test runs, so maybe this is an artefact of
>>> running on such a small number of cores? Although I *think* I tried this
>>> before and got the same type of error on several thousand cores also.
>>
>> You seem to have increased the chunk dimension to be larger than the
>> dataset dimension. What is the chunk size and dataspace size you are using?
>>
>> Quincey
>>
>>>
>>> Kind regards,
>>>
>>> Stefan Frijters
>>>
>>> ________________________________________
>>> From: [email protected] [[email protected]] On
>>> Behalf Of Quincey Koziol [[email protected]]
>>> Sent: 24 March 2010 16:28
>>> To: HDF Users Discussion List
>>> Subject: Re: [Hdf-forum] HDF5 causes Fatal error in MPI_Gather
>>>
>>> Hi Stefan,
>>>
>>> On Mar 24, 2010, at 10:10 AM, Stefan Frijters wrote:
>>>
>>>> Hi Quincey,
>>>>
>>>> Thanks for the quick response. Currently, each core is handling its
>>>> datasets with a chunk size equal to the size of the local data (the dims
>>>> parameter in h5pset_chunk_f is equal to to dims parameter in
>>>> h5dwrite_f) because the local arrays are not that large anyway (in the
>>>> order of 20x20x20 reals), so if I understand things correctly I'm
>>>> already using maximum chunk size.
>>>
>>> No, you don't have to make them the same size, since the collective
>>> I/O should stitch them back together anyway. Try doubling the dimensions
>>> on your chunks.
>>>
>>>> Do you have an idea why it doesn't crash the first time I try to do it
>>>> though? It's a different array, but of the same size and datatype as the
>>>> second. As far as I can see I'm closing all used handles at the end of
>>>> my function at least.
>>>
>>> Hmm, I'm not certain...
>>>
>>> Quincey
>>>
>>>> Kind regards,
>>>>
>>>> Stefan Frijters
>>>>
>>>>> Hi Stefan,
>>>>>
>>>>> On Mar 24, 2010, at 3:11 AM, Stefan Frijters wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Recently, I've run into a problem with my parallel HDF5 writes. My
>>>>>> program works fine on 8k cores, but when I run it on 16k cores it
>>>>>> crashes when writing a data file through h5dwrite_f(...).
>>>>>> File writes go through one function in the code, so it always uses the
>>>>>> same code, but for some reason I don't understand it writes one file
>>>>>> without problems, but the second one throws the following error message:
>>>>>>
>>>>>> Abort(1) on node 0 (rank 0 in comm 1140850688): Fatal error in
>>>>>> MPI_Gather: Invalid buffer pointer, error stack:
>>>>>> MPI_Gather(758): MPI_Gather(sbuf=0xa356f400, scount=16000, MPI_BYTE,
>>>>>> rbuf=(nil), rcount=16000, MPI_BYTE, root=0, comm=0x84000003) failed
>>>>>> MPI_Gather(675): Null buffer pointer
>>>>>>
>>>>>> I've been looking through the HDF5 source code and it only seems to call
>>>>>> MPI_Gather in one place, in the function H5D_obtain_mpio_mode. In that
>>>>>> function HDF tries to allocate a receive buffer using
>>>>>>
>>>>>> recv_io_mode_info = (uint8_t *)H5MM_malloc(total_chunks * mpi_size);
>>>>>>
>>>>>> Which then returns the null pointer seen in rbuf=(nil) instead of a
>>>>>> valid pointer. Thus, to me it seems it's HDF causing the problem and not
>>>>>> MPI.
>>>>>>
>>>>>> This problem occurs in both collective and independent IO mode.
>>>>>>
>>>>>> Do you have any idea what might be causing this problem, or how to
>>>>>> resolve it? I'm not sure what kind of other information you might need,
>>>>>> but I'll do my best to supply it, if you need any.
>>>>>
>>>>> This is a scalability problem we are aware of and are working to address,
>>>>> but in the meanwhile, can you increase the size of your chunks for your
>>>>> dataset(s)? (That will reduce the number of chunks and the size of the
>>>>> buffer being allocated)
>>>>>
>>>>> Quincey
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [email protected]
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org