Hi

Yes the content is more or less a random set of charcters. I'll try some
combinations and see what is the best. We need to transfer the file over a
network, so thats why we need to compress as much as possible. Will the
block size/chunk size have anything to say?

cheers, Håkon

On 25 March 2010 15:48, Peter Cao <[email protected]> wrote:

> Hi Håkon,
>
> I don't need the code. As long as it works for you. I am happy.
>
> Deflate level 6 is a good combination between file size and performance.
> The compression ratio depends on the content. If every string is like a
> random set of characters, the compression will not do much help. I will
> leave
> it to you to try different compression options. If compression does not do
> much help, it will be much better not using compression at all. It's your
> call.
>
>
> Thanks
> --pc
>
>
>
> Håkon Sagehaug wrote:
>
>>
>>
>> Hi Peter
>>
>> Thanks for all the help so far, I've added code to add the last elements,
>> if you want to have it i can past it in a new email to you. One more
>> question, we need to compress the data I've now tried like this, within
>> createDataset(...)
>>
>>
>> H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
>> H5.H5Pset_chunk(plist, RANK, chunkSize);
>> H5.H5Pset_deflate(plist, 9);
>>
>> I'm not sure what is the most efficient way, tried to exchange the
>> H5Pset_deflate(plist, 9) with
>>
>> H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);
>>
>> but did not see any diffrens.  I read the szip would maybe be better. If
>>  I don't use deflate the hdf file is 1.5 gb with deflate it's 1.3 gb. So my
>> hopes is that it can be further decreased in size.
>>
>> cheers, Håkon
>>
>> On 24 March 2010 17:25, Peter Cao <[email protected] <mailto:
>> [email protected]>> wrote:
>>
>>    Hi Håkon,
>>
>>    Glad to know it work for you. Also you need to take care of the
>>    case that
>>    the last block does not have the size of BLOCK_SIZE. This will happen
>>    if the total size (25M) is not divided by BLOCK_SIZE. For better
>>    performance,
>>    make sure that BLOCK_SIZE is divided by CHUNK_SIZE.
>>
>>
>>    Thanks
>>    --pc
>>
>>
>>    Håkon Sagehaug wrote:
>>
>>        Hi Peter,
>>
>>        Thanks so much for the code, seems to work very well, the only
>>        thing I found was that when the index for next index to write
>>        in the hdf array, I had to add 1 to it, so instead of
>>
>>           start_idx = i;
>>
>>        I now have
>>
>>           start_idx = i + 1;
>>
>>        cheers, Håkon
>>
>>
>>
>>
>>        On 24 March 2010 01:19, Peter Cao <[email protected]
>>        <mailto:[email protected]> <mailto:[email protected]
>>
>>        <mailto:[email protected]>>> wrote:
>>
>>           Hi Håkon,
>>
>>           Below is the program that you can start with. I am using
>>        variable
>>           length strings.
>>           For fixed length strings, there are some extra work. You
>>        may have
>>           to make the
>>           strings to the same length.
>>
>>           You may try different chunk sizes and block sizes to have
>>        the best
>>           performance.
>>
>>           =======================
>>           import ncsa.hdf.hdf5lib.H5;
>>           import ncsa.hdf.hdf5lib.HDF5Constants;
>>           import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;
>>
>>           public class CreateStrings {
>>
>>             private final static String H5_FILE = "G:\\temp\\strings.h5";
>>             private final static String DNAME = "/strs";
>>             private final static int RANK = 1;
>>             private final static long[] DIMS = { 25000000 };
>>             private final static long[] MAX_DIMS = {
>>           HDF5Constants.H5S_UNLIMITED };
>>             private final static long[] CHUNK_SIZE = { 25000 };
>>             private final static int BLOCK_SIZE = 250000;
>>
>>             private void createDataset(int fid) throws Exception {
>>                 int did = -1, tid = -1, sid = -1, plist = -1;
>>
>>                 try {
>>
>>                     tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
>>                     // use variable length to save space
>>                     H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE);
>>                     sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS);
>>
>>                     // figure out creation properties
>>
>>                     plist =
>>        H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
>>                     H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
>>                     H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);
>>
>>                     did = H5.H5Dcreate(fid, DNAME, tid, sid, plist);
>>                 } finally {
>>                     try {
>>                         H5.H5Pclose(plist);
>>                     } catch (HDF5Exception ex) {
>>                     }
>>                     try {
>>                         H5.H5Sclose(sid);
>>                     } catch (HDF5Exception ex) {
>>                     }
>>                     try {
>>                         H5.H5Dclose(did);
>>                     } catch (HDF5Exception ex) {
>>                     }
>>                 }
>>             }
>>
>>             private void writeData(int fid) throws Exception {
>>                 int did = -1, tid = -1, msid = -1, fsid = -1;
>>                 long[] count = { BLOCK_SIZE };
>>
>>                 try {
>>                     did = H5.H5Dopen(fid, DNAME);
>>                     tid = H5.H5Dget_type(did);
>>                     fsid = H5.H5Dget_space(did);
>>                     msid = H5.H5Screate_simple(RANK, count, null);
>>                     String[] strs = new String[BLOCK_SIZE];
>>
>>                     int idx = 0, block_indx = 0, start_idx = 0;
>>                     long t0 = 0, t1 = 0;
>>                     t0 = System.currentTimeMillis();
>>                     System.out.println("Total number of blocks = "
>>                             + (DIMS[0] / BLOCK_SIZE));
>>                     for (int i = 0; i < DIMS[0]; i++) {
>>                         strs[idx++] = "str" + i;
>>                         if (idx == BLOCK_SIZE) { // operator % is
>>        very expensive
>>                             idx = 0;
>>                             H5.H5Sselect_hyperslab(fsid,
>>           HDF5Constants.H5S_SELECT_SET,
>>                                     new long[] { start_idx }, null,
>>        count,
>>           null);
>>                             H5.H5Dwrite(did, tid, msid, fsid,
>>                                     HDF5Constants.H5P_DEFAULT, strs);
>>
>>                             if (block_indx == 10) {
>>                                 t1 = System.currentTimeMillis();
>>                                 System.out.println("Total time
>>        (minutes) = "
>>                                         + ((t1 - t0) * (DIMS[0] /
>>           BLOCK_SIZE)) / 1000
>>                                         / 600);
>>                             }
>>
>>                             block_indx++;
>>                             start_idx = i;
>>                         }
>>
>>                     }
>>
>>                 } finally {
>>                     try {
>>                         H5.H5Sclose(fsid);
>>                     } catch (HDF5Exception ex) {
>>                     }
>>                     try {
>>                         H5.H5Sclose(msid);
>>                     } catch (HDF5Exception ex) {
>>                     }
>>                     try {
>>                         H5.H5Dclose(did);
>>                     } catch (HDF5Exception ex) {
>>                     }
>>                 }
>>             }
>>
>>             private void createFile() throws Exception {
>>                 int fid = -1;
>>
>>                 fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,
>>
>>                         HDF5Constants.H5P_DEFAULT,
>>        HDF5Constants.H5P_DEFAULT);
>>
>>                 if (fid < 0)
>>                     return;
>>
>>                 try {
>>                     createDataset(fid);
>>                     writeData(fid);
>>                 } finally {
>>                     H5.H5Fclose(fid);
>>                 }
>>             }
>>
>>             /**
>>              * @param args
>>              */
>>             public static void main(String[] args) {
>>                 try {
>>                     (new CreateStrings()).createFile();
>>                 } catch (Exception ex) {
>>                     ex.printStackTrace();
>>                 }
>>             }
>>
>>           }
>>           =========================
>>
>>
>>
>>
>>           _______________________________________________
>>           Hdf-forum is for HDF software users discussion.
>>           [email protected] <mailto:[email protected]>
>>        <mailto:[email protected] <mailto:[email protected]>>
>>
>>
>>
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>
>>
>>
>>  ------------------------------------------------------------------------
>>
>>
>>
>>        _______________________________________________
>>        Hdf-forum is for HDF software users discussion.
>>        [email protected] <mailto:[email protected]>
>>        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>    _______________________________________________
>>    Hdf-forum is for HDF software users discussion.
>>    [email protected] <mailto:[email protected]>
>>    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>
>>
>>
>>
>>
>> --
>> Håkon Sagehaug, Scientific Programmer
>> Parallab, Uni BCCS/Uni Research
>> [email protected] <mailto:[email protected]>, phone +47 55584125
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to