It seems you're discovering the issues right as I'm typing this!
I'm glad you were able to solve the issue with the hanging. I was starting to suspect an issue with the MPI implementation but it's usually the last thing on the list after inspecting the code itself. As you've seen, it seems that PETSc is creating a NULL dataspace for the ranks which are not contributing, instead of creating a Scalar/Simple dataspace on all ranks and calling H5Sselect_none() for those that don't participate. This would most likely explain the reason you saw the assertion failure in the non-filtered case, as the legacy code probably was not expecting to receive a NULL dataspace. On top of that, the NULL dataspace seems like it is causing the parallel operation to break collective mode, which is not allowed when filters are involved. I would need to do some research as to why this happens before deciding whether it's more appropriate to modify this in HDF5 or to have PETSc not use NULL dataspaces. Avoiding deadlock from the final sort has been an issue I had to re-tackle a few different times due to the nature of the code's complexity, but I will investigate using the chunk offset as a secondary sort key and see if it will run into problems in any other cases. Ideally, the chunk redistribution might be updated in the future to involve all ranks in the operation instead of just rank 0, also allowing for improvements to the redistribution algorithm that may solve these problems, but for the time being this may be sufficient.
_______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@lists.hdfgroup.org http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5