Hi, I'm currently working on a CUDA based compression filter, and I would find it very useful to have some more callback functions to manage GPU resources. Unfortunately allocating memory on a CUDA device has some considerable overhead, and by having to do it separately for each chunk in the dataset slows down compression and decompression by up to a factor of 10.
Right now I have 2 workarounds for writing the data (compression): 1. Allocate the device memory in the *set local* callback function and store the pointer in the cd_values array. Disadvantage: possible memory leak if resources are not freed manually afterwards. 2. Manage resources and compression manually and use H5DOwrite_chunk to write the dataset. Disadvantage: more difficult to include in other applications. For reading (decompression) I have no workaround to keep track of the device memory, so it has to be allocated and deallocated for every chunk when reading the dataset. It would be great to have two more callback functions for these kind of tasks: one could be called before opening the dataset, and the other after closing the dataset. These callback functions could then pass information to the filter function similarly to the *set local*; either through cd_values, or if that could case some inconsistencies, than maybe through a new variable. Having these features would really help to boost my filter's performance, and I think other 3rd party filters could also benefit from this. Cheers, Bálint
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
