Hi Quincey, I have the source for the application from Chris and will do some more debugging to find out what is really going on...
Mark On Fri, May 21, 2010 at 12:25 PM, Quincey Koziol <[email protected]> wrote: > Hi Mark, > > On May 21, 2010, at 1:57 PM, Mark Howison wrote: > >> Hi Quincey, it is from: >> >>>> #008: H5FDmpio.c line 1726 in H5FD_mpio_write(): can't convert from >>>> size to size_i > > Ah, sorry! It is actually in the MPI-IO VFD (and was in your original > message! :-) Hmm, that 'size' parameter is actually a length variable, not an > offset variable. Are you sure all the datasets are small? > > Quincey > >> Mark >> >> On Thu, May 20, 2010 at 4:15 AM, Quincey Koziol <[email protected]> wrote: >>> Hi Mark, >>> Sorry for the delay in replying... >>> >>> On May 18, 2010, at 11:31 AM, Mark Howison wrote: >>> >>>> Hi all, >>>> >>>> Chris Calderon, a user at NERSC, is receiving the errors at the bottom >>>> of the email during the following scenario: >>>> >>>> - a subset of 40 MPI tasks are each opening their own HDF5 file with >>>> MPI-IO in collective mode with the MPI_COMM_SELF communicator >>>> - each task writes about 20,000 small datasets totaling 10GB per file >>>> >>>> It's worth noting that we don't intend to use MPI-IO in independent mode, >>>> so >>>> we don't really need to fix this error to make the code operational, >>>> but we'd like to understand why the error occurred. At the lowest >>>> level, the error is "can't convert from size to size_i" and looking up >>>> the relevant code, I found: >>>> >>>> size_i = (int)size; >>>> if((hsize_t)size_i != size) >>>> HGOTO_ERROR... >>>> >>>> So my guess is that the offsets at some point become large enough to >>>> cause an int32 overflow. (Each file is about 10GB total, so the >>>> overflow probably occurs around the 8GB mark since 2 billion elements >>>> times 4 bytes per float = 8GB.) Is this a known bug in the MPI-IO VFD? >>>> This suggests that the bug will also affect independent mode, but >>>> another work around is for us to use the MPI-POSIX VFD, which should >>>> bypass this problem. >>> >>> There is a limitation in the MPI standard which specifies that an >>> 'int' type must be used for certain file operations, but we may be able to >>> relax that for the MPI-POSIX driver. Could you give me the line number for >>> the code snippet above? I'll take a look and see if it really needs to be >>> there. >>> >>> Thanks, >>> Quincey >>> >>>> I looked into using the CORE VFD per Mark Miller's suggestion in an earlier >>>> thread, but the problem is that the 10GB of data will not fit into memory, >>>> and I didn't see any API calls for requesting a "dump to file" before the >>>> file close. >>>> >>>> Thanks >>>> Mark >>>> >>>> ---- >>>> >>>> HDF5-DIAG: Error detected in HDF5 (1.8.4) MPI-process 16: >>>> #000: H5Dio.c line 266 in H5Dwrite(): can't write data >>>> major: Dataset >>>> minor: Write failed >>>> #001: H5Dio.c line 578 in H5D_write(): can't write data >>>> major: Dataset >>>> minor: Write failed >>>> #002: H5Dmpio.c line 552 in H5D_contig_collective_write(): couldn't >>>> finish shared collective MPI-IO >>>> major: Low-level I/O >>>> minor: Write failed >>>> #003: H5Dmpio.c line 1586 in H5D_inter_collective_io(): couldn't >>>> finish collective MPI-IO >>>> major: Low-level I/O >>>> minor: Can't get value >>>> #004: H5Dmpio.c line 1632 in H5D_final_collective_io(): optimized write >>>> failed >>>> major: Dataset >>>> minor: Write failed >>>> #005: H5Dmpio.c line 334 in H5D_mpio_select_write(): can't finish >>>> collective parallel write >>>> major: Low-level I/O >>>> minor: Write failed >>>> #006: H5Fio.c line 167 in H5F_block_write(): file write failed >>>> major: Low-level I/O >>>> minor: Write failed >>>> #007: H5FDint.c line 185 in H5FD_write(): driver write request failed >>>> major: Virtual File Layer >>>> minor: Write failed >>>> #008: H5FDmpio.c line 1726 in H5FD_mpio_write(): can't convert from >>>> size to size_i >>>> major: Internal error (too specific to document in detail) >>>> minor: Out of range >>>> >>>> _______________________________________________ >>>> Hdf-forum is for HDF software users discussion. >>>> [email protected] >>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>> >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> [email protected] >>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
