Thanks Elena.
I still have some more questions. I am trying to optimize my datasets for faster access. My observation is that the time to access complete data increases linearly with the number of datasets. Say there is only 1 dataset in each group vs there are 10 datasets in each group. I wrote a program to read just 1 datasets from each group from both files. First file which contains only 1 dataset takes very less time compared to file with 10 datasets. 2nd file is 2 -3 times slower that first file (3 seconds vs 9 seconds). Both reading same amount of data (dataset that I read from both file contains same data). I have repacked the file such that the chunk size equals to the dimensions of each dataset. This is not what I expect. Since the structure of hdf5 file is similar to unix files, number of files should not affect the access time as long as you are reading same amount of data. Datasets are accessed using points. What is it that I am doing wrong? How can I maximize my reading performance. My data looks as follows: Each group has 5 levels of market data. Different formats I have tried File 1: file has level 1 in each group. (Fastest but only provides level1) File 2: file has 5 datasets for each level within each group. (this is scaling linearly. Slower than file 1 format. Even if I read only level1, still very slow compared to file 1) File 3: file has all 5 levels in 1 dataset (horizontally spread). My reading access pattern: I have to read either only level1 or all 5 levels together. I am thinking of 2 different files, 1 with level1 dataset and other with all datasets in 1 file. I feel this is quite inefficient. I would like to keep all the data in single file. Do you guys have any suggestions? Alok Jadhav GAT IT From: [email protected] [mailto:[email protected]] On Behalf Of Elena Pourmal Sent: Monday, August 27, 2012 8:45 PM To: HDF Users Discussion List Subject: Re: [Hdf-forum] why changing the format had adverse effect Hi Alok, Please try to run h5stat tool (http://www.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Stat) to see how space is allocated in the file for raw data and HDF5 metadata. Elena ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elena Pourmal The HDF Group http://hdfgroup.org 1800 So. Oak St., Suite 203, Champaign IL 61820 217.531.6112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Aug 26, 2012, at 9:15 PM, alokjadhav wrote: Hi, could someone comment on this? I am still not sure why new format with less number of elements is taking so much more storage space. One more observation is that format 1 has around 300 groups ..each with 2 datasets. -> total 600 datasets. format 2 has around 200 groups .. each with 11 datasets -> total of 2200 datasets. in fromat 1, each dataset is a double array where as in format 2, each dataset is a complex type. (doubles and ints mixed). What is the overhead of having a complex dtype vs a double array. having 2200 datasets vs 600 datasets, can it double the size of the hdf5 file? i am basically converting horizontal data into vertical data with more datasets. Regards, Alok -- View this message in context: http://hdf-forum.184993.n3.nabble.com/why-changing-the-format-had-advers e-effect-tp4025330p4025344.html Sent from the hdf-forum mailing list archive at Nabble.com. _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
