Hi Jay, I am saving and loading floats converting them to 16-bit half floats in HDF5 quite efficiently, therefore I am pretty sure you can do it with your data. I may be blind but I do not see where you register your conversion functions, or is it that you expect HDF5 to convert implicitly by truncating? Maybe this is where your code is not efficient?
HTH Dimitris Servis P.S. I have found that for large datasets (like yours) of reduced precision numbers that you need to store for later post processing and don't need to keep in memory it is more efficient to make in-place conversion and then write to HDF5 using the reduced datatype. 2011/3/18 Jay Banyer <[email protected]> > Gday, > > I'm using 24 bit integers with HDF5 but finding the performance very poor. > > I'm new to HDF5. I'm evaluating the format for use with a radio telescope. > The telescope will produce about 7TB per 12 hours of raw data, so space and > write efficiency are important. > > The telescope system produces 24 bit signed integers. If we convert to 32 > bits our files will grow by 33%, ie over 9.5TB instead of 7TB. > > I successfully wrote 24 bit integers with HDF5. Unfortunately the writing > is very inefficient: the CPU sits at 100% and the write rate (in integers > per second) is about five times slower than with 32 bit ints, even though > the file is smaller. The file writing is CPU-bound: the disk is hardly > working at all, unlike with 32 bits where it's disk-bound. > > It appears that the conversion from 32 bits in memory to 24 bits by the > library is very inefficient. I've done this with my own code and it's > possible to do very quickly so the writing is still disk-bound. In all cases > I'm using little-endian as my platform is Intel. > > Is it possible to tune HDF5 to write 24 bit integers more efficiently? I've > included code snippets below. > > Cheers, > Jay. > > void write_integration_hdf5(struct cmac_cells* cmac, struct > corr_packet_header* header, hid_t f) { > static hid_t datatype = -1; > static hid_t dataspace = -1; > > // We use either 32 or 24 bit types for values, according to args. 24 bit > is custom made. > if (datatype < 0) { > if (arg_24bits) { > // Define an HDF5 custom type for 24 bit int, little-endian (actually, > native) > datatype = H5Tcopy(H5T_NATIVE_INT32); > H5Tset_size(datatype, 3); //3 bytes > if (datatype < 0) { > fprintf(stderr, "Error creating HDF5 24 bit int type\n"); > exit(-1); > } > } else { > datatype = H5T_NATIVE_INT32; > } > } > > // Define the HDF5 dataspace, ie the rank and size of the dataset array > if (dataspace < 0) { > int rank = 2; //baseline x re/im > hsize_t dims[] = {NUM_CMAC_CELLS*NUM_VIS_PER_CELL, 2}; > dataspace = H5Screate_simple(rank, dims, dims); > if (dataspace < 0) { > fprintf(stderr, "Error creating HDF5 dataspace\n"); > exit(-1); > } > } > > // Add a dataset > char name[50]; > sprintf(name, "INT%d_FREQ%d", header->integration_num, > header->frequency); > hid_t dataset = H5Dcreate(f, name, datatype, dataspace, H5P_DEFAULT); > if (dataset < 0) { > fprintf(stderr, "Error creating dataset %s\n", name); > exit(-1); > } > > // Write the vis data to the new dataset > const char* buffer = (const char*)cmac->cells; > if (H5Dwrite(dataset, H5T_NATIVE_INT32, H5S_ALL, H5S_ALL, H5P_DEFAULT, > buffer) < 0) { > fprintf(stderr, "Error writing to dataset %s\n", name); > exit(-1); > } > > // Close the dataset > H5Dclose(dataset); > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
