Hi Robert,
This is a known limitation in Parallel HDF5. It is more of a limitation
inside ROMIO, which should be part of your MPI library, where you can
not write more than 2GB worth of data in one call. I do not have a
specific time table as to when this will be fixed.
more details about the problem can be found here:
http://www.hdfgroup.org/hdf5-quest.html#p2gb
Thanks,
Mohamad
On 1/24/2013 3:01 PM, Robert McLay wrote:
I am unable to write a 2GB dataset from a single task. I have the
same problem with 1.8.8, 1.8.9, and 1.8.10. I have attached
a FORTRAN 90 program that shows the problem. I also have a C program
that shows the same problem so I do not think this is a problem
of FORTRAN to C convertion. The program is a parallel f90 program
run as a single task.
Here is the error report:
$ ./example
HDF5-DIAG: Error detected in HDF5 (1.8.10) MPI-process 0:
#000: H5Dio.c line 266 in H5Dwrite(): can't write data
major: Dataset
minor: Write failed
#001: H5Dio.c line 673 in H5D__write(): can't write data
major: Dataset
minor: Write failed
#002: H5Dmpio.c line 544 in H5D__contig_collective_write(): couldn't
finish shared collective MPI-IO
major: Low-level I/O
minor: Write failed
#003: H5Dmpio.c line 1523 in H5D__inter_collective_io(): couldn't
finish collective MPI-IO
major: Low-level I/O
minor: Can't get value
#004: H5Dmpio.c line 1567 in H5D__final_collective_io(): optimized
write failed
major: Dataset
minor: Write failed
#005: H5Dmpio.c line 312 in H5D__mpio_select_write(): can't finish
collective parallel write
major: Low-level I/O
minor: Write failed
#006: H5Fio.c line 158 in H5F_block_write(): write through metadata
accumulator failed
major: Low-level I/O
minor: Write failed
#007: H5Faccum.c line 816 in H5F_accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#008: H5FDint.c line 185 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#009: H5FDmpio.c line 1842 in H5FD_mpio_write():
MPI_File_write_at_all failed
major: Internal error (too specific to document in detail)
minor: Some MPI function failed
#010: H5FDmpio.c line 1842 in H5FD_mpio_write(): Invalid argument,
error stack:
MPI_FILE_WRITE_AT_ALL(84): Invalid count argument
major: Internal error (too specific to document in detail)
minor: MPI Error String
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread.so.0 00000031C4D0C4F0 Unknown Unknown Unknown
libc.so.6 00000031C3E721E3 Unknown Unknown Unknown
example 000000000071C646 Unknown Unknown Unknown
libmpich.so.3 00002B0682CBF48C Unknown Unknown Unknown
libmpich.so.3 00002B0682E6AE91 Unknown Unknown Unknown
libmpich.so.3 00002B0682E6BDB2 Unknown Unknown Unknown
example 00000000004AE9EC Unknown Unknown Unknown
example 00000000004A9014 Unknown Unknown Unknown
example 0000000000497040 Unknown Unknown Unknown
example 00000000004990A2 Unknown Unknown Unknown
example 0000000000693991 Unknown Unknown Unknown
example 00000000004597A9 Unknown Unknown Unknown
example 000000000045C144 Unknown Unknown Unknown
example 000000000043B6B4 Unknown Unknown Unknown
Attached is the configuration data
Attached is a program which produces the error report. I compiled
this fortran 90 program with:
h5pfc -g -O0 -o example example.f90
Internal to the program is a variable "LocalSz" which is 646.
8*(646^3) is bigger than 2*1024^3. The program works if LocalSz is 645.
Thanks for looking at this.
--
Robert McLay, Ph.D.
TACC
Manager, HPC Software Tools
(512) 232-8104
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org