It seems you're discovering the issues right as I'm typing this!

I'm glad you were able to solve the issue with the hanging. I was starting to 
suspect an issue with the MPI implementation but it's usually the last thing on 
the list after inspecting the code itself.


As you've seen, it seems that PETSc is creating a NULL dataspace for the ranks 
which are not contributing, instead of creating a Scalar/Simple dataspace on 
all ranks and calling H5Sselect_none() for those that don't participate. This 
would most likely explain the reason you saw the assertion failure in the 
non-filtered case, as the legacy code probably was not expecting to receive a 
NULL dataspace. On top of that, the NULL dataspace seems like it is causing the 
parallel operation to break collective mode, which is not allowed when filters 
are involved. I would need to do some research as to why this happens before 
deciding whether it's more appropriate to modify this in HDF5 or to have PETSc 
not use NULL dataspaces.


Avoiding deadlock from the final sort has been an issue I had to re-tackle a 
few different times due to the nature of the code's complexity, but I will 
investigate using the chunk offset as a secondary sort key and see if it will 
run into problems in any other cases. Ideally, the chunk redistribution might 
be updated in the future to involve all ranks in the operation instead of just 
rank 0, also allowing for improvements to the redistribution algorithm that may 
solve these problems, but for the time being this may be sufficient.
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to