luccadibe opened a new pull request, #2394: URL: https://github.com/apache/systemds/pull/2394
This PR aims to fix and optimize the HDF5Reader implementation from systemds with the goal of being able to correctly read the So2Sat LCZ42 dataset (https://mediatum.ub.tum.de/1454690) . For this I added support for the filter pipeline and attribute message types from HDF5 ; n dimensional matrices with n>2 are flattened into 2d . I also added support for inferring hdf5 from the .h5 file extension. Apologies for the massive PR :face_with_head_bandage: . I benchmarked the performance of the new implementation and shared results in this repo: https://github.com/luccadibe/systemds-hdf5-reader-benchmark The code still needs some work regarding code style and formatting ( I am not sure if I set up the fomatter correctly as mentioned in the CONTRIBUTING.md ; in some files I was getting a huge diff so I tried to format only what I touched). I am unsure about how to best split this into multiple PRs , or if that is wanted even. I would appreciate some general feedback on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
