Thanks Dan. This was very helpful. Ryan
On Fri, Aug 19, 2011 at 9:40 AM, Daniel Kahn <[email protected]> wrote: > ** > Ryan, > > > Ryan Price wrote: > > I have a 4GB binary dump of data that I'd like to store as a hdf5 dataset > (using command line tools if possible). The data set will have the > dimensions 31486448 x 128. I believe this is too big to import as a data > set in one go. > > > 4GB is a large array. You may wish to give some thought to how the data > will be used after you have created the file. Will the end user really > process all 4GB at once? HDF5 provides chunking and compression > functionality which (transparently to the data reader, and almost > transparently to the writer) will store the data in "chunks" and compress > them as well, if you'd like. If you can make the chunk size close to amount > of data the end user will want to access it can be very convenient. > > Here is a piece of Python I wrote to demonstrate creating a file with > chunking and compression. I was able to open the file and view the dataset > properties in HDFView, but not the dataset itself because the array is so > large. You can use this code if the major axis of your binary dump is in > the 128 direction. If it is in the other direction you'll probably want to > choose different chunking parameters, and read the binary data off disk > appropriately. (By the way, I get a 3.8MB file since the array contains a > single value and I've turned on compression.) > > import numpy; import h5py > > with h5py.File('BigArray.h5','w') as fid: > dset = fid.create_dataset('BigArray',shape=(31486448, > 128),dtype='int8',chunks=(31486448, 1), compression='gzip') > slicearray = numpy.ones([31486448],dtype='int8') # one-D array, of > length 31486448 > for i in range(128): > # replace this comment with read from binary file into slicearray > print "Writing",i > dset[:,i] = slicearray # populate slice i of the HDF5 dataset. > > Cheers, > --dan > > Running h5import gives the following error: > Unable to allocate dynamic memory. > Error in allocating unsigned integer data storage. > Error in reading the input file: my_data > Program aborted. > > So I split the binary dump into four files which can be imported. I'd > still like to have one 31486448 x 128 dataset but am not sure that's > possible to do. > > Any idea how I could combine these four binary dumps into one data set. > Maybe create a single dataset and append each small one...? > > Thanks, > > Ryan > > ------------------------------ > > _______________________________________________ > Hdf-forum is for HDF software users > [email protected]http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > -- > Daniel Kahn > Science Systems and Applications Inc.301-867-2162 > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
