Actually, it's not the H5Screate() that crashes; that works fine since HDF5 1.8.7. It's a zero-sized malloc somewhere inside the call to H5Dwrite(), possibly in the filter. I think this is close to resolution; just have to get tools on it.
On Thu, Nov 9, 2017 at 8:47 AM, Michael K. Edwards <m.k.edwa...@gmail.com> wrote: > Apparently this has been reported before as a problem with PETSc/HDF5 > integration: > https://lists.mcs.anl.gov/pipermail/petsc-users/2012-January/011980.html > > On Thu, Nov 9, 2017 at 8:37 AM, Michael K. Edwards > <m.k.edwa...@gmail.com> wrote: >> Thank you for the validation, and for the suggestion to use >> H5Sselect_none(). That is probably the right thing for the dataspace. >> Not quite sure what to do about the memspace, though; the comment is >> correct that we crash if any of the dimensions is zero. >> >> On Thu, Nov 9, 2017 at 8:34 AM, Jordan Henderson >> <jhender...@hdfgroup.org> wrote: >>> It seems you're discovering the issues right as I'm typing this! >>> >>> >>> I'm glad you were able to solve the issue with the hanging. I was starting >>> to suspect an issue with the MPI implementation but it's usually the last >>> thing on the list after inspecting the code itself. >>> >>> >>> As you've seen, it seems that PETSc is creating a NULL dataspace for the >>> ranks which are not contributing, instead of creating a Scalar/Simple >>> dataspace on all ranks and calling H5Sselect_none() for those that don't >>> participate. This would most likely explain the reason you saw the assertion >>> failure in the non-filtered case, as the legacy code probably was not >>> expecting to receive a NULL dataspace. On top of that, the NULL dataspace >>> seems like it is causing the parallel operation to break collective mode, >>> which is not allowed when filters are involved. I would need to do some >>> research as to why this happens before deciding whether it's more >>> appropriate to modify this in HDF5 or to have PETSc not use NULL dataspaces. >>> >>> >>> Avoiding deadlock from the final sort has been an issue I had to re-tackle a >>> few different times due to the nature of the code's complexity, but I will >>> investigate using the chunk offset as a secondary sort key and see if it >>> will run into problems in any other cases. Ideally, the chunk redistribution >>> might be updated in the future to involve all ranks in the operation instead >>> of just rank 0, also allowing for improvements to the redistribution >>> algorithm that may solve these problems, but for the time being this may be >>> sufficient. _______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@lists.hdfgroup.org http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5