It looks like you are taking ASCII data (which may be numerical) and then storing to hdf5 datasets as character data. If the ASCII input is (for the most part) numerical, you need to parse the numbers and convert them from their string form (e.g. a 'char *foo="1.2345"') to their numerical form (e.g. float foo=1.2345) and then write the dataset as H5T_NATIVE_FLOAT. Otherwise, all you are doing is storing ASCII data to hdf5 datasets and then also paying for all the additional HDF5 metadata.
On Tue, 2010-12-14 at 16:12 -0800, Collignon, Barbara C. wrote: > I have ~98K ASCII files ( ~4K each so ~380MB in total) > > I convert those files into a single binary file (HDF5_FILE) > using basic functions: > > (..) > char buffer[FILE_LEN][LINE_LEN]; > foreach (file in list_of_files) > buffer=get(all_lines_in_the_file); > dataset = > H5LTmake_dataset(HDF5_FILE,DATASET_NAME,2,dimsfx,H5T_NATIVE_CHAR,buffer); > end > (..) > > but the HDF5_FILE final size is ~1GB ... almost 3 times the size of the > ASCII files put all together. > > Could someone please sheds light on that point ? > > Barbara > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org -- Mark C. Miller, Lawrence Livermore National Laboratory ================!!LLNL BUSINESS ONLY!!================ [email protected] urgent: [email protected] T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
