Hi Quincey,

I have the source for the application from Chris and will do some more
debugging to find out what is really going on...

Mark

On Fri, May 21, 2010 at 12:25 PM, Quincey Koziol <[email protected]> wrote:
> Hi Mark,
>
> On May 21, 2010, at 1:57 PM, Mark Howison wrote:
>
>> Hi Quincey, it is from:
>>
>>>> #008: H5FDmpio.c line 1726 in H5FD_mpio_write(): can't convert from
>>>> size to size_i
>
>        Ah, sorry!  It is actually in the MPI-IO VFD (and was in your original 
> message! :-) Hmm, that 'size' parameter is actually a length variable, not an 
> offset variable.  Are you sure all the datasets are small?
>
>        Quincey
>
>> Mark
>>
>> On Thu, May 20, 2010 at 4:15 AM, Quincey Koziol <[email protected]> wrote:
>>> Hi Mark,
>>>        Sorry for the delay in replying...
>>>
>>> On May 18, 2010, at 11:31 AM, Mark Howison wrote:
>>>
>>>> Hi all,
>>>>
>>>> Chris Calderon, a user at NERSC, is receiving the errors at the bottom
>>>> of the email during the following scenario:
>>>>
>>>> - a subset of 40 MPI tasks are each opening their own HDF5 file with
>>>> MPI-IO in collective mode with the MPI_COMM_SELF communicator
>>>> - each task writes about 20,000 small datasets totaling 10GB per file
>>>>
>>>> It's worth noting that we don't intend to use MPI-IO in independent mode, 
>>>> so
>>>> we don't really need to fix this error to make the code operational,
>>>> but we'd like to understand why the error occurred. At the lowest
>>>> level, the error is "can't convert from size to size_i" and looking up
>>>> the relevant code, I found:
>>>>
>>>> size_i = (int)size;
>>>>   if((hsize_t)size_i != size)
>>>>       HGOTO_ERROR...
>>>>
>>>> So my guess is that the offsets at some point become large enough to
>>>> cause an int32 overflow. (Each file is about 10GB total, so the
>>>> overflow probably occurs around the 8GB mark since 2 billion elements
>>>> times 4 bytes per float = 8GB.) Is this a known bug in the MPI-IO VFD?
>>>> This suggests that the bug will also affect independent mode, but
>>>> another work around is for us to use the MPI-POSIX VFD, which should
>>>> bypass this problem.
>>>
>>>        There is a limitation in the MPI standard which specifies that an 
>>> 'int' type must be used for certain file operations, but we may be able to 
>>> relax that for the MPI-POSIX driver.  Could you give me the line number for 
>>> the code snippet above?  I'll take a look and see if it really needs to be 
>>> there.
>>>
>>>        Thanks,
>>>                Quincey
>>>
>>>> I looked into using the CORE VFD per Mark Miller's suggestion in an earlier
>>>> thread, but the problem is that the 10GB of data will not fit into memory,
>>>> and I didn't see any API calls for requesting a "dump to file" before the
>>>> file close.
>>>>
>>>> Thanks
>>>> Mark
>>>>
>>>> ----
>>>>
>>>> HDF5-DIAG: Error detected in HDF5 (1.8.4) MPI-process 16:
>>>> #000: H5Dio.c line 266 in H5Dwrite(): can't write data
>>>>  major: Dataset
>>>>  minor: Write failed
>>>> #001: H5Dio.c line 578 in H5D_write(): can't write data
>>>>  major: Dataset
>>>>  minor: Write failed
>>>> #002: H5Dmpio.c line 552 in H5D_contig_collective_write(): couldn't
>>>> finish shared collective MPI-IO
>>>>  major: Low-level I/O
>>>>  minor: Write failed
>>>> #003: H5Dmpio.c line 1586 in H5D_inter_collective_io(): couldn't
>>>> finish collective MPI-IO
>>>>  major: Low-level I/O
>>>>  minor: Can't get value
>>>> #004: H5Dmpio.c line 1632 in H5D_final_collective_io(): optimized write 
>>>> failed
>>>>  major: Dataset
>>>>  minor: Write failed
>>>> #005: H5Dmpio.c line 334 in H5D_mpio_select_write(): can't finish
>>>> collective parallel write
>>>>  major: Low-level I/O
>>>>  minor: Write failed
>>>> #006: H5Fio.c line 167 in H5F_block_write(): file write failed
>>>>  major: Low-level I/O
>>>>  minor: Write failed
>>>> #007: H5FDint.c line 185 in H5FD_write(): driver write request failed
>>>>  major: Virtual File Layer
>>>>  minor: Write failed
>>>> #008: H5FDmpio.c line 1726 in H5FD_mpio_write(): can't convert from
>>>> size to size_i
>>>>  major: Internal error (too specific to document in detail)
>>>>  minor: Out of range
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [email protected]
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to