Thank you for the response.

2013/3/5 <[email protected]>

> On 2013-02-27 21:01, Pradeep Jha wrote:
>
>> Thanks for the response. So from what I understand, the HD5 fortran
>> wrapper automatically transposes the matrix to store it in the C
>> storage conventions. So puttings "dims(1) = Nx" and "dims(3) = Nz" is
>> correct. HDF5 fortran wrapper is just transposing the data inside the
>> program before storing it in the final h5 format.
>>
>> But this is confusing me about something.
>>
>> I convert my original unformatted data written by fortran to h5
>> format so that I can visualize the original data using a software
>> (Paraview). Does that mean that the data I will visualize using
>> Paraview and the h5 data file will be a transposed data of what I
>> originally intended to visualize?
>>
>
> Yes, if you are writing a multidimensional array using the fortran APIs
> and you want to read the data back using the same dimensional array then
> you need to use the fortran APIs. If you were to use a C program to read
> the hdf5 data back then you need to account for the transposing in the C
> program, or handle it when writing from fortran.
>
> Here is an explanation from the archives:
>
>  HDF5 is a "self-describing" format, which means that HDF5 metadata
>> stored in a dataset object header allows the HDF5 C library and any
>> other non-C applications built on top of it, to retrieve a raw data
>> (i.e. elements of a multidimensional array) in the correct order.
>>
>> (Let's for a second forget about HDF5, C and Fortran, Python and
>> Matlab :-) )
>>
>> If we have a matrix A(N,M,K), we usually count dimensions from left
>> to right saying that the first dimension has size N, the second
>> dimension has size M, the third dimension has size K, and so on.
>>
>> (Now let's talk about HDF5 but without referring to any language.)
>>
>> When we describe a matrix using HDF5 datatspace object, we use the
>> same convention  (i.e. specifying dimensions from left to right): the
>> first dimension has size N, the second dimension has size M,  the
>> third dimension has size K. (Aside: Please notice that this
>> description is valid for both C and Fortran HDF5 applications, i.e. C
>> and  Fortran dims array needed by H5Screate_simple
>> (h5screate_simple_f) will have the values dims [] = {N,M,K}).
>>
>> The question is: how does HDF5 know how to interpret a blob of  {N x
>> M x K x by sizeof(datatype)}  bytes of dataset raw data stored in the
>> file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other
>> permutation of (K,N,M)?
>>
>> HDF5 file has no clue about matrices and their dimensions, and the
>> languages they were written from. This is application's
>> responsibility to interpret data correctly and pass the  correct
>> interpretation to the HDF5 C library to store in a file.
>>
>> As it was mentioned above, dimensions of the matrix are described
>> using HDF5 dataspace object and are stored in the file.  d integers
>> P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace
>> object header according to the following convention:  the last value
>> - Pd is the size of the FASTEST changing dimension of the matrix,
>> i.e. HDF5 file spec  and HDF5 C library follow C storage convention
>> (no wonder, it is a C library :-). Therefore there is no ambiguity in
>> interpreting  {N x M x K x sizeof(datatype)} bytes, and HDF5 file has
>> enough information to interpret data correctly by any "row-major" or
>> "column-major" application (including bypassing HDF5 C library and
>> reading directly from the HDF5 file!)
>>
>> Here is what is happening when HDF5 Fortran library is used:
>>
>> Suppose we want to write A(N,M,K) matrix to the HDF5 file.  HDF5
>> Fortran API describes dataspace with the first dimension being N, the
>> second dimension being M, the third dimension being K (as we would do
>> it in C and any other language).  But HDF5 Fortran API also knows
>> that the fastest changing dimension has size N (i.e. we have
>> column-major order). Therefore HDF5 Fortran library instructs C
>> library to store K,M,N values in the dataspace object header instead
>> of N,M,K, since N is the size of the fastest changing dimension.
>>
>> So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype)
>> blob) written from Fortran by a C application, we will  read it to
>> the matrix B(K,M,N) ( C API that requests sizes of the first, second
>> and third dimensions will return values K,M,N stored in the dataspace
>> header)
>>
>> If we read matrix A(N,M,K) written from Fortran by Fortran
>> application, we will read it once again into B(N,M,K) ( Fortran API
>> that requests sizes of the first, second and third dimension will
>> flip an array K,M,N stored in the file and return N,M,K)
>>
>> In other words: HDF5 library stores information about how to
>> interpret data. Interpretation follows C storage convention: the last
>> dimension specified for the dataspace object is the fastest changing
>> one. It is the responsibility of the application (in this case
>> FORTRAN HDF5 library) to interpret correctly the order of dimensions
>> and pass to/ from the HDF5 C library.
>>
>> Please notice that there is no need to transpose data itself: one
>> only has to pass a correct interpretation of the data to the HDF5 C
>> Library  and to make sure it is done according to the HDF5 C library
>> convention - the first value stored in the dataspace header
>> corresponds to the slowest changing dimension, ...., the last value
>> stored in the dataspace header corresponds to the fastest changing
>> dimension).
>>
>
>
> ______________________________**_________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/**mailman/listinfo/hdf-forum_**hdfgroup.org<http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org>
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to