For compression, block size does not matter. Chunk size matters. Usually larger chunk size tends to compress better. We usually use 64KB to 1MB for chunk size for better performance. Try different chunk size, block size, and compression methods and level to have the best I/O performance and compression ratio. As I mentioned earlier, if the content is random, the compression will not help much.

Thanks
--pc


Håkon Sagehaug wrote:
Hi

Yes the content is more or less a random set of charcters. I'll try some combinations and see what is the best. We need to transfer the file over a network, so thats why we need to compress as much as possible. Will the block size/chunk size have anything to say?

cheers, Håkon

On 25 March 2010 15:48, Peter Cao <[email protected] <mailto:[email protected]>> wrote:

    Hi Håkon,

    I don't need the code. As long as it works for you. I am happy.

    Deflate level 6 is a good combination between file size and
    performance.
    The compression ratio depends on the content. If every string is
    like a
    random set of characters, the compression will not do much help. I
    will leave
    it to you to try different compression options. If compression
    does not do
    much help, it will be much better not using compression at all.
    It's your call.


    Thanks
    --pc



    Håkon Sagehaug wrote:



        Hi Peter

        Thanks for all the help so far, I've added code to add the
        last elements, if you want to have it i can past it in a new
        email to you. One more question, we need to compress the data
        I've now tried like this, within createDataset(...)


        H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
        H5.H5Pset_chunk(plist, RANK, chunkSize);
        H5.H5Pset_deflate(plist, 9);

        I'm not sure what is the most efficient way, tried to exchange
        the H5Pset_deflate(plist, 9) with

        H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

        but did not see any diffrens.  I read the szip would maybe be
        better. If  I don't use deflate the hdf file is 1.5 gb with
        deflate it's 1.3 gb. So my hopes is that it can be further
        decreased in size.

        cheers, Håkon

        On 24 March 2010 17:25, Peter Cao <[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>> wrote:

           Hi Håkon,

           Glad to know it work for you. Also you need to take care of the
           case that
           the last block does not have the size of BLOCK_SIZE. This
        will happen
           if the total size (25M) is not divided by BLOCK_SIZE. For
        better
           performance,
           make sure that BLOCK_SIZE is divided by CHUNK_SIZE.


           Thanks
           --pc


           Håkon Sagehaug wrote:

               Hi Peter,

               Thanks so much for the code, seems to work very well,
        the only
               thing I found was that when the index for next index to
        write
               in the hdf array, I had to add 1 to it, so instead of

                  start_idx = i;

               I now have

                  start_idx = i + 1;

               cheers, Håkon




               On 24 March 2010 01:19, Peter Cao <[email protected]
        <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>

               <mailto:[email protected] <mailto:[email protected]>>>>
        wrote:

                  Hi Håkon,

                  Below is the program that you can start with. I am using
               variable
                  length strings.
                  For fixed length strings, there are some extra work. You
               may have
                  to make the
                  strings to the same length.

                  You may try different chunk sizes and block sizes to
        have
               the best
                  performance.

                  =======================
                  import ncsa.hdf.hdf5lib.H5;
                  import ncsa.hdf.hdf5lib.HDF5Constants;
                  import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

                  public class CreateStrings {

                    private final static String H5_FILE =
        "G:\\temp\\strings.h5";
                    private final static String DNAME = "/strs";
                    private final static int RANK = 1;
                    private final static long[] DIMS = { 25000000 };
                    private final static long[] MAX_DIMS = {
                  HDF5Constants.H5S_UNLIMITED };
                    private final static long[] CHUNK_SIZE = { 25000 };
                    private final static int BLOCK_SIZE = 250000;

                    private void createDataset(int fid) throws Exception {
                        int did = -1, tid = -1, sid = -1, plist = -1;

                        try {

                            tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                            // use variable length to save space
                            H5.H5Tset_size(tid,
        HDF5Constants.H5T_VARIABLE);
                            sid = H5.H5Screate_simple(RANK, DIMS,
        MAX_DIMS);

                            // figure out creation properties

                            plist =
               H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                            H5.H5Pset_layout(plist,
        HDF5Constants.H5D_CHUNKED);
                            H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

                            did = H5.H5Dcreate(fid, DNAME, tid, sid,
        plist);
                        } finally {
                            try {
                                H5.H5Pclose(plist);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Sclose(sid);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Dclose(did);
                            } catch (HDF5Exception ex) {
                            }
                        }
                    }

                    private void writeData(int fid) throws Exception {
                        int did = -1, tid = -1, msid = -1, fsid = -1;
                        long[] count = { BLOCK_SIZE };

                        try {
                            did = H5.H5Dopen(fid, DNAME);
                            tid = H5.H5Dget_type(did);
                            fsid = H5.H5Dget_space(did);
                            msid = H5.H5Screate_simple(RANK, count, null);
                            String[] strs = new String[BLOCK_SIZE];

                            int idx = 0, block_indx = 0, start_idx = 0;
                            long t0 = 0, t1 = 0;
                            t0 = System.currentTimeMillis();
                            System.out.println("Total number of blocks = "
                                    + (DIMS[0] / BLOCK_SIZE));
                            for (int i = 0; i < DIMS[0]; i++) {
                                strs[idx++] = "str" + i;
                                if (idx == BLOCK_SIZE) { // operator % is
               very expensive
                                    idx = 0;
                                    H5.H5Sselect_hyperslab(fsid,
                  HDF5Constants.H5S_SELECT_SET,
                                            new long[] { start_idx },
        null,
               count,
                  null);
                                    H5.H5Dwrite(did, tid, msid, fsid,
                                            HDF5Constants.H5P_DEFAULT,
        strs);

                                    if (block_indx == 10) {
                                        t1 = System.currentTimeMillis();
                                        System.out.println("Total time
               (minutes) = "
                                                + ((t1 - t0) * (DIMS[0] /
                  BLOCK_SIZE)) / 1000
                                                / 600);
                                    }

                                    block_indx++;
                                    start_idx = i;
                                }

                            }

                        } finally {
                            try {
                                H5.H5Sclose(fsid);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Sclose(msid);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Dclose(did);
                            } catch (HDF5Exception ex) {
                            }
                        }
                    }

                    private void createFile() throws Exception {
                        int fid = -1;

                        fid = H5.H5Fcreate(H5_FILE,
        HDF5Constants.H5F_ACC_TRUNC,

                                HDF5Constants.H5P_DEFAULT,
               HDF5Constants.H5P_DEFAULT);

                        if (fid < 0)
                            return;

                        try {
                            createDataset(fid);
                            writeData(fid);
                        } finally {
                            H5.H5Fclose(fid);
                        }
                    }

                    /**
                     * @param args
                     */
                    public static void main(String[] args) {
                        try {
                            (new CreateStrings()).createFile();
                        } catch (Exception ex) {
                            ex.printStackTrace();
                        }
                    }

                  }
                  =========================




                  _______________________________________________
                  Hdf-forum is for HDF software users discussion.
                  [email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>


http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org




------------------------------------------------------------------------



               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org







-- Håkon Sagehaug, Scientific Programmer
        Parallab, Uni BCCS/Uni Research
        [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>,
        phone +47 55584125

        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        [email protected] <mailto:[email protected]>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected] <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org





------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to