Hi Yes the content is more or less a random set of charcters. I'll try some combinations and see what is the best. We need to transfer the file over a network, so thats why we need to compress as much as possible. Will the block size/chunk size have anything to say?
cheers, Håkon On 25 March 2010 15:48, Peter Cao <[email protected]> wrote: > Hi Håkon, > > I don't need the code. As long as it works for you. I am happy. > > Deflate level 6 is a good combination between file size and performance. > The compression ratio depends on the content. If every string is like a > random set of characters, the compression will not do much help. I will > leave > it to you to try different compression options. If compression does not do > much help, it will be much better not using compression at all. It's your > call. > > > Thanks > --pc > > > > Håkon Sagehaug wrote: > >> >> >> Hi Peter >> >> Thanks for all the help so far, I've added code to add the last elements, >> if you want to have it i can past it in a new email to you. One more >> question, we need to compress the data I've now tried like this, within >> createDataset(...) >> >> >> H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED); >> H5.H5Pset_chunk(plist, RANK, chunkSize); >> H5.H5Pset_deflate(plist, 9); >> >> I'm not sure what is the most efficient way, tried to exchange the >> H5Pset_deflate(plist, 9) with >> >> H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8); >> >> but did not see any diffrens. I read the szip would maybe be better. If >> I don't use deflate the hdf file is 1.5 gb with deflate it's 1.3 gb. So my >> hopes is that it can be further decreased in size. >> >> cheers, Håkon >> >> On 24 March 2010 17:25, Peter Cao <[email protected] <mailto: >> [email protected]>> wrote: >> >> Hi Håkon, >> >> Glad to know it work for you. Also you need to take care of the >> case that >> the last block does not have the size of BLOCK_SIZE. This will happen >> if the total size (25M) is not divided by BLOCK_SIZE. For better >> performance, >> make sure that BLOCK_SIZE is divided by CHUNK_SIZE. >> >> >> Thanks >> --pc >> >> >> Håkon Sagehaug wrote: >> >> Hi Peter, >> >> Thanks so much for the code, seems to work very well, the only >> thing I found was that when the index for next index to write >> in the hdf array, I had to add 1 to it, so instead of >> >> start_idx = i; >> >> I now have >> >> start_idx = i + 1; >> >> cheers, Håkon >> >> >> >> >> On 24 March 2010 01:19, Peter Cao <[email protected] >> <mailto:[email protected]> <mailto:[email protected] >> >> <mailto:[email protected]>>> wrote: >> >> Hi Håkon, >> >> Below is the program that you can start with. I am using >> variable >> length strings. >> For fixed length strings, there are some extra work. You >> may have >> to make the >> strings to the same length. >> >> You may try different chunk sizes and block sizes to have >> the best >> performance. >> >> ======================= >> import ncsa.hdf.hdf5lib.H5; >> import ncsa.hdf.hdf5lib.HDF5Constants; >> import ncsa.hdf.hdf5lib.exceptions.HDF5Exception; >> >> public class CreateStrings { >> >> private final static String H5_FILE = "G:\\temp\\strings.h5"; >> private final static String DNAME = "/strs"; >> private final static int RANK = 1; >> private final static long[] DIMS = { 25000000 }; >> private final static long[] MAX_DIMS = { >> HDF5Constants.H5S_UNLIMITED }; >> private final static long[] CHUNK_SIZE = { 25000 }; >> private final static int BLOCK_SIZE = 250000; >> >> private void createDataset(int fid) throws Exception { >> int did = -1, tid = -1, sid = -1, plist = -1; >> >> try { >> >> tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1); >> // use variable length to save space >> H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE); >> sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS); >> >> // figure out creation properties >> >> plist = >> H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE); >> H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED); >> H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE); >> >> did = H5.H5Dcreate(fid, DNAME, tid, sid, plist); >> } finally { >> try { >> H5.H5Pclose(plist); >> } catch (HDF5Exception ex) { >> } >> try { >> H5.H5Sclose(sid); >> } catch (HDF5Exception ex) { >> } >> try { >> H5.H5Dclose(did); >> } catch (HDF5Exception ex) { >> } >> } >> } >> >> private void writeData(int fid) throws Exception { >> int did = -1, tid = -1, msid = -1, fsid = -1; >> long[] count = { BLOCK_SIZE }; >> >> try { >> did = H5.H5Dopen(fid, DNAME); >> tid = H5.H5Dget_type(did); >> fsid = H5.H5Dget_space(did); >> msid = H5.H5Screate_simple(RANK, count, null); >> String[] strs = new String[BLOCK_SIZE]; >> >> int idx = 0, block_indx = 0, start_idx = 0; >> long t0 = 0, t1 = 0; >> t0 = System.currentTimeMillis(); >> System.out.println("Total number of blocks = " >> + (DIMS[0] / BLOCK_SIZE)); >> for (int i = 0; i < DIMS[0]; i++) { >> strs[idx++] = "str" + i; >> if (idx == BLOCK_SIZE) { // operator % is >> very expensive >> idx = 0; >> H5.H5Sselect_hyperslab(fsid, >> HDF5Constants.H5S_SELECT_SET, >> new long[] { start_idx }, null, >> count, >> null); >> H5.H5Dwrite(did, tid, msid, fsid, >> HDF5Constants.H5P_DEFAULT, strs); >> >> if (block_indx == 10) { >> t1 = System.currentTimeMillis(); >> System.out.println("Total time >> (minutes) = " >> + ((t1 - t0) * (DIMS[0] / >> BLOCK_SIZE)) / 1000 >> / 600); >> } >> >> block_indx++; >> start_idx = i; >> } >> >> } >> >> } finally { >> try { >> H5.H5Sclose(fsid); >> } catch (HDF5Exception ex) { >> } >> try { >> H5.H5Sclose(msid); >> } catch (HDF5Exception ex) { >> } >> try { >> H5.H5Dclose(did); >> } catch (HDF5Exception ex) { >> } >> } >> } >> >> private void createFile() throws Exception { >> int fid = -1; >> >> fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC, >> >> HDF5Constants.H5P_DEFAULT, >> HDF5Constants.H5P_DEFAULT); >> >> if (fid < 0) >> return; >> >> try { >> createDataset(fid); >> writeData(fid); >> } finally { >> H5.H5Fclose(fid); >> } >> } >> >> /** >> * @param args >> */ >> public static void main(String[] args) { >> try { >> (new CreateStrings()).createFile(); >> } catch (Exception ex) { >> ex.printStackTrace(); >> } >> } >> >> } >> ========================= >> >> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> >> >> >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] <mailto:[email protected]> >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] <mailto:[email protected]> >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> >> >> >> >> >> >> -- >> Håkon Sagehaug, Scientific Programmer >> Parallab, Uni BCCS/Uni Research >> [email protected] <mailto:[email protected]>, phone +47 55584125 >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
