Re: [Hdf-forum] RFC: libHDF5 to support row and column major storage?

Werner Benger Tue, 12 May 2015 00:18:07 -0700

Hi Jason,

I was facing the same issues as pretty much all use case I know andhave in my visualization software and context use and require "fortran"order of indexing, including OpenGL graphics. It's not really an issuewith HDF5 as the only thing required is to permute the indices whenaccessing the HDF5 API. And the HDF5 tools of course will display datatransposed then. This index permutation is supported in the F5 libraryvia a generic permutation vector that is stored with a group of datasetsharing the same properties (the F5 library is a C library on top ofHDF5 guiding towards a specific data model for various classes of datatypes occurring particularly in scientific visualization):


http://www.fiberbundle.net/doc/structChartDomain__IDs.html

So via the F5 API one would see the fortran-like indexing convention,whereas whenever accessing data with the lower-level HDF5 API, it'sC-like convention (whereby the permutation vector gives the option ofarbitrary permutations).

I remember there had been plans by the HDF5 group to introduce "nameddataspaces", similarly to "named datatypes", that could then be storedin the file as its own entity. Such would be a good place to storeproperties of a dataspace as attributes on a dataspace, and to have suchshared among datasets. It would be a natural place to store apermutation vector, which could be reduced to a simple flag as well tojust distinguish between C and fortran indexing conventions. Of course,all the related tools would also need to honor such an attribute then.Until then, one could use an attribute on each dataset and implementindex permutation similar to how the F5 library does it. It may be saferto use new API functions anyway to not break old code that alwaysexpects C order indexing.


          Werner

On 12.05.2015 06:48, Jason Newton wrote:

Hi -
I've been a evangelist for HDF5 for a few of years now, it is a nobleand amazing library that solves data storage issues occurring withscientific and beyond applications - e.g. it can save many developersfrom wasting time and money so they can spend that on solving moreoriginal problems. But you guys knew that already. I think there'sbeen a mistake though - that is the lack of first class column-vs-rowmajor storage. In a world where we are split down the middle on whatformat we used based on what application, library and language we usewe work in one or the other it is an ongoing reality that there willnever be one true standard to follow. But HDF5 sought to only supportrow-major - and I can back that up - standardizing is a good thing.But then as time has shown, that really didn't work for alot of folks- such as those in Matlab and fortran - when they read our data - itlooks transposed to them! When HDF5 utils/our code sees their data -it looks transposed to us! These are arguably the users you do notwant to face these difficulties as it makes it down rightembarrassing at times and hard to work around in within that language(ahem, Matlab again is painful to work with). Not only that but itdoesn't really scale - it will always take some manual fixing andthere's no standardized mark for whether a dataset is one of thesecolumn major masquerading datasets. So let me assure you this isquite ugly to deal with in Matlab/etc and doesn't seem to be the pathmany people take - and it can require skills many people don't have orunderstanding that they can't give.
But then, why did we allow saving column major data in a row basedstandard in the first place? Well, the answer seems to beperformance. Surely it can't take that long to convert the datasets -most of the time at least - although there would for sure be somememory based limitations to allow transposing just as HDF IOs. Butalas - the current state of the library indicates otherwise and thusis the users job to handle correctly transforming the data back andforth between application and party. But wait - wasn't this kind ofactivity what HDF5 was built to alleviate in the first place?
So then how do we rectify the situation? Well speaking as a developerusing HDF5 extensively and writing libraries for it - it looks to meit should be in the core library as it is exceedingly messy to handleon the user side each time. I think the interpretation of the datasetand it's dimensions should be based on dataset creation properties.This would allow an official marking of what kind of interpretationthe raw storage of the data (and dimensions?) are. However, this isonly half of the battle. We'd need something like the type conversionsystem to permute order in all the right places if the user needs toIO an opposing storage layout. And it should be fast and light onmemory. Perhaps it would merely operate inplace as a new utilitysubroutine taking in the mem_type and user memory. However I can stillthink of one problem this does not address: compound types using amixture of philosophies with fields being the opposite to the datasetlayout - and this case has me completely stumped as this indicates itshould be type level as well. The compound part of this is a stickysituation but I'd still motion that the dataset creation propertyworks for most things that occur in practice.
So... has the HDF5 group tried to deal with this wart yet? Let me knowif anything is on the drawing board.
-Jason


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


--
___________________________________________________________________________
Dr. Werner Benger                Visualization Research
Center for Computation & Technology at Louisiana State University (CCT/LSU)
2019  Digital Media Center, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809                        Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] RFC: libHDF5 to support row and column major storage?

Reply via email to