Re: [Hdf-forum] Collective IO and filters

Michael K. Edwards Thu, 09 Nov 2017 09:07:46 -0800

Actually, it's not the H5Screate() that crashes; that works fine since
HDF5 1.8.7.  It's a zero-sized malloc somewhere inside the call to
H5Dwrite(), possibly in the filter.  I think this is close to
resolution; just have to get tools on it.


On Thu, Nov 9, 2017 at 8:47 AM, Michael K. Edwards
<m.k.edwa...@gmail.com> wrote:
> Apparently this has been reported before as a problem with PETSc/HDF5
> integration:  
> https://lists.mcs.anl.gov/pipermail/petsc-users/2012-January/011980.html
>
> On Thu, Nov 9, 2017 at 8:37 AM, Michael K. Edwards
> <m.k.edwa...@gmail.com> wrote:
>> Thank you for the validation, and for the suggestion to use
>> H5Sselect_none().  That is probably the right thing for the dataspace.
>> Not quite sure what to do about the memspace, though; the comment is
>> correct that we crash if any of the dimensions is zero.
>>
>> On Thu, Nov 9, 2017 at 8:34 AM, Jordan Henderson
>> <jhender...@hdfgroup.org> wrote:
>>> It seems you're discovering the issues right as I'm typing this!
>>>
>>>
>>> I'm glad you were able to solve the issue with the hanging. I was starting
>>> to suspect an issue with the MPI implementation but it's usually the last
>>> thing on the list after inspecting the code itself.
>>>
>>>
>>> As you've seen, it seems that PETSc is creating a NULL dataspace for the
>>> ranks which are not contributing, instead of creating a Scalar/Simple
>>> dataspace on all ranks and calling H5Sselect_none() for those that don't
>>> participate. This would most likely explain the reason you saw the assertion
>>> failure in the non-filtered case, as the legacy code probably was not
>>> expecting to receive a NULL dataspace. On top of that, the NULL dataspace
>>> seems like it is causing the parallel operation to break collective mode,
>>> which is not allowed when filters are involved. I would need to do some
>>> research as to why this happens before deciding whether it's more
>>> appropriate to modify this in HDF5 or to have PETSc not use NULL dataspaces.
>>>
>>>
>>> Avoiding deadlock from the final sort has been an issue I had to re-tackle a
>>> few different times due to the nature of the code's complexity, but I will
>>> investigate using the chunk offset as a secondary sort key and see if it
>>> will run into problems in any other cases. Ideally, the chunk redistribution
>>> might be updated in the future to involve all ranks in the operation instead
>>> of just rank 0, also allowing for improvements to the redistribution
>>> algorithm that may solve these problems, but for the time being this may be
>>> sufficient.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Collective IO and filters

Reply via email to