Thank you for the explanation. That's consistent with what I see when I add a debug printf into H5D__construct_filtered_io_info_list(). So I'm now looking into the filter situation. It's possible that the H5Z-blosc glue is mishandling the case where the compressed data is larger than the uncompressed data.
About to write 12 of 20 About to write 0 of 20 About to write 0 of 20 About to write 8 of 20 Rank 0 selected 12 of 20 Rank 1 selected 8 of 20 HDF5-DIAG: Error detected in HDF5 (1.11.0) MPI-process 0: #000: H5Dio.c line 319 in H5Dwrite(): can't prepare for writing data major: Dataset minor: Write failed #001: H5Dio.c line 395 in H5D__pre_write(): can't write data major: Dataset minor: Write failed #002: H5Dio.c line 836 in H5D__write(): can't write data major: Dataset minor: Write failed #003: H5Dmpio.c line 1019 in H5D__chunk_collective_write(): write error major: Dataspace minor: Write failed #004: H5Dmpio.c line 934 in H5D__chunk_collective_io(): couldn't finish filtered linked chunk MPI-IO major: Low-level I/O minor: Can't get value #005: H5Dmpio.c line 1474 in H5D__link_chunk_filtered_collective_io(): couldn't process chunk entry major: Dataset minor: Write failed #006: H5Dmpio.c line 3278 in H5D__filtered_collective_chunk_entry_io(): couldn't unfilter chunk for modifying major: Data filters minor: Filter operation failed #007: H5Z.c line 1256 in H5Z_pipeline(): filter returned failure during read major: Data filters minor: Read failed On Thu, Nov 9, 2017 at 1:02 PM, Jordan Henderson <jhender...@hdfgroup.org> wrote: > For the purpose of collective I/O it is true that all ranks must call > H5Dwrite() so that they can participate in those collective operations that > are necessary (the file space re-allocation and so on). However, even though > they called H5Dwrite() with a valid memspace, the fact that they have a NONE > selection in the given file space should cause their chunk-file mapping > struct (see lines 357-385 of H5Dpkg.h for the struct's definition and the > code for H5D__link_chunk_filtered_collective_io() to see how it uses this > built up list of chunks selected in the file) to contain no entries in the > "fm->sel_chunks" field. That alone should mean that during the chunk > redistribution, they will not actually send anything at all to any of the > ranks. They only participate there for the sake that, were the method of > redistribution modified, ranks which previously had no chunks selected could > potentially be given some chunks to work on. > > > For all practical purposes, every single chunk_entry seen in the list from > rank 0's perspective should be a valid I/O caused by some rank writing some > positive amount of bytes to the chunk. On rank 0's side, you should be able > to check the io_size field of each of the chunk_entry entries and see how > big the I/O is from the "original_owner" to that chunk. If any of these are > 0, something is likely very wrong. If that is indeed the case, you could > likely pull a hacky workaround by manually removing them from the list, but > I'd be more concerned about the root of the problem if there are zero-size > I/O chunk_entry entries being added to the list. _______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@lists.hdfgroup.org http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5