Hi Håkon,

Make sure the DIMS is the total dimension size at H5.H5Dextend(did_SNP, DIMS),
i.e. the new dims size = old size + increase.

Thanks
--pc


Håkon Sagehaug wrote:
Hi Peter,

My question now was more regarding the dynamically adding of data to a dataset, when I don't know the number of entries in the table I wish to store in the hdf file. The compression issue is no longer relevant, because the values we needed to store from the lines in the file where all integers. I'm struggling to get this part of the code working for me

...

 /* Need to extend the data set */
                           H5.H5Dextend(did_SNP, DIMS);
                           int extended_dataspace_id = H5
                                   .H5Dget_space(did_SNP);
                           H5.H5Sselect_all(extended_
dataspace_id);
                           H5
.H5Sselect_hyperslab(extended_dataspace_id,
                                           HDF5Constants.H5S_SELECT_SET,
                                           new long[] { start_idx }, null,
                                           count, null);
                           H5.H5Dwrite(did_SNP, type_int_id, msid,
                                   extended_dataspace_id,
                                   HDF5Constants.H5P_DEFAULT,
                                   currentSNPIdArray);

cheers, Håkon


On 12 April 2010 18:46, Peter Cao <[email protected] <mailto:[email protected]>> wrote:

    Hi Håkon,

    HDF5 does not compress variable length data well (basically you
    are trying to compress
    addresses of pointers to variable length data). For best
    performance, you should not use
    compression for variable length strings.


    Thanks
    --pc


    Håkon Sagehaug wrote:

        Hi again,

        In my current application that you help with, I know how many
        lines I want to write before hand, but I also wanted to get
        working the use case when I don't know this before hand. So i
        tried to modify the program you more or less wrote for me. My
        test file contains many lines each with one integer on it like
        this,

        31643
        36594
        59354
        2481
        64079
        64181
        491566836

        In the test program below, I've set the initial size to be 2
        for the segments to write to the dataset. After the first
        segment is written I need to extend the dataset set, select
        the portion to write using hyperslab and the write it, but
        having some problems. I tried to follow the example here [1],
        but have not succeeded. I pasted in the code below

        public class HDFExtendLDData {       private final static
        String H5_FILE = "/scratchtestHap/strings.h5";
           private final static String DNAME_SNP = "/snp.id.one";
           private final static int RANK = 1;
           private final static long[] MAX_DIMS = {
        HDF5Constants.H5S_UNLIMITED };
           /***
            * Creates a dataset for holding values of type integer,
        with a given
            * dimension, chucking and a group name.
            *
            * @param fid
            * @param dims
            * @param chunkSize
            * @param groupName
            * @throws Exception
            */
           private void createIntegerDataset(int fid, long[] dims,
        long[] chunkSize,
                   String groupName) throws Exception {
               int did_snp = -1, type_int_id = -1, sid = -1, plist =
        -1, group_id = -1;

               try {
                   type_int_id = H5.H5Tcopy(HDF5Constants.H5T_STD_I32LE);

                   sid = H5.H5Screate_simple(RANK, dims, MAX_DIMS);

                   plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);

                   H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);

                   H5.H5Pset_chunk(plist, RANK, chunkSize);

                   H5.H5Pset_deflate(plist, 6);

                   group_id = H5.H5Gcreate(fid, groupName,
        HDF5Constants.H5P_DEFAULT);

                   did_snp = H5.H5Dcreate(group_id, groupName + DNAME_SNP,
                           type_int_id, sid, plist);

                   System.out.println("created for chr " + groupName);

               } finally {
                   try {
                       H5.H5Pclose(plist);

                   } catch (HDF5Exception ex) {
                   }
                   try {
                       H5.H5Sclose(sid);

                   } catch (HDF5Exception ex) {
                   }
                   try {
                       H5.H5Dclose(did_snp);

                       H5.H5Gclose(group_id);
                   } catch (HDF5Exception ex) {
                   }

               }
           }
               /***
            * Input a directory path, that contains some files. Need
        to extract the
            * data from the files and create one data set for each
        file within one hdf
            * file.
            *
            * @param fid
            *            Hdf File id
            * @param sourceFolder
            *            Path to the folder
            * @throws Exception
            */
           private void writeDataFromFileToInt(int fid, String
        sourceFolder)
                   throws Exception {
               int did_SNP = -1, msid = -1, fsid = -1, timesWritten =
        0, group_id = -1;

               Collection<File> fileCollection =
        FileUtils.listFiles(new File(
                       sourceFolder), new String[] { "txt" }, false);

               int filesAdded = 0;

               /* Loop through a directory of files. */
               for (File sourceFile : fileCollection) {
                   try {
                       String choromosome_tmp =
        sourceFile.getName().split("_")[1];
                       String choromosome = choromosome_tmp.substring(3,
                               choromosome_tmp.length());
                       choromosome = "/" + choromosome + "/";

                       /* Setting the initial size of the data set */
                       long[] DIMS = { 2 };
                       long[] CHUNK_SIZE = { 2 };
                       int BLOCK_SIZE = 2;

                       long[] count = { BLOCK_SIZE };

                       /* Creates a new data set for the file to parse */
                       createIntegerDataset(fid, DIMS, CHUNK_SIZE,
        choromosome);

                       /* open the group that holds the data set */
                       group_id = H5.H5Gopen(fid, choromosome);

                       /* open the data set */
                       did_SNP = H5.H5Dopen(group_id, choromosome +
        DNAME_SNP);

                       /* fetches the data type, should be integer */
                       int type_int_id = H5.H5Dget_type(did_SNP);

                       fsid = H5.H5Dget_space(did_SNP);

                       /* Memeory space */
                       msid = H5.H5Screate_simple(RANK, count, null);

                       /* Array for storing the values */
                       int[] currentSNPIdArray = new int[BLOCK_SIZE];

                       /* File to read teh values from */
                       BigFile ldFile = new
        BigFile(sourceFile.getAbsolutePath());

                       int idx = 0, block_indx = 0, start_idx = 0;
                       System.out.println("Started to parse the file");

                       int currentLine = 0;
                       timesWritten = 0;

                       /* Iterating over each line in the file */
                       for (String ldLine : ldFile) {

                           currentSNPIdArray[idx] =
        Integer.valueOf(ldLine);

                           idx++;

                           if (idx == BLOCK_SIZE) {
                               idx = 0;
                               if (timesWritten == 0) {
                                   /* Just write to the data set */
                                   H5
                                           .H5Sselect_hyperslab(fsid,
HDF5Constants.H5S_SELECT_SET,
                                                   new long[] {
        start_idx }, null,
                                                   count, null);
                                   H5.H5Dwrite(did_SNP, type_int_id,
        msid, fsid,
                                           HDF5Constants.H5P_DEFAULT,
                                           currentSNPIdArray);
                               } else {
                                   /* Need to extend the data set */
                                   H5.H5Dextend(did_SNP, DIMS);
                                   int extended_dataspace_id = H5
                                           .H5Dget_space(did_SNP);
H5.H5Sselect_all(extended_dataspace_id);
                                   H5
.H5Sselect_hyperslab(extended_dataspace_id, HDF5Constants.H5S_SELECT_SET,
                                                   new long[] {
        start_idx }, null,
                                                   count, null);
                                   H5.H5Dwrite(did_SNP, type_int_id, msid,
                                           extended_dataspace_id,
                                           HDF5Constants.H5P_DEFAULT,
                                           currentSNPIdArray);
                               }

                               block_indx++;
                               start_idx = currentLine + 1;
                               timesWritten++;

                           }

                           currentLine++;

                       }
                       filesAdded++;

                       System.out.println("Finished parsing the file ");

                   } finally {
                       try {
                           H5.H5Gclose(group_id);
                           H5.H5Sclose(fsid);

                       } catch (HDF5Exception ex) {
                       }
                       try {
                           H5.H5Sclose(msid);
                       } catch (HDF5Exception ex) {
                       }
                       try {
                           H5.H5Dclose(did_SNP);
                       } catch (HDF5Exception ex) {
                       }
                   }
               }

           }

           public void createFile(String sourceFile) throws Exception {
               int fid = -1;

               fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,
                       HDF5Constants.H5P_DEFAULT,
        HDF5Constants.H5P_DEFAULT);

               if (fid < 0)
                   return;

               try {
                   writeDataFromFileToInt(fid, sourceFile);
               } finally {
                   H5.H5Fclose(fid);
               }
           }

        When running the code I get this error

        Exception in thread "main"
        ncsa.hdf.hdf5lib.exceptions.HDF5LibraryException
           at ncsa.hdf.hdf5lib.H5.H5Dwrite_int(Native Method)
           at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1139)
           at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1181)
           at
        
no.uib.bccs.esysbio.sample.clients.HDFExtendLDData.writeDataFromFileToInt(HDFExtendLDData.java:145)

        The line number in my code corresponds to where I'm writing to
        the dataset after I've extended it.

        Any tips on how to solve the issue?

        cheers, Håkon

        [1]
        
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java


        On 25 March 2010 16:13, Peter Cao <[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>> wrote:

           For compression, block size does not matter. Chunk size
        matters.
           Usually larger chunk size tends
           to compress better. We usually use 64KB to 1MB for chunk
        size for
           better performance. Try
           different chunk size, block size, and compression methods and
           level to have the best I/O performance
           and compression ratio. As I mentioned earlier, if the
        content is
           random, the compression will not help much.


           Thanks
           --pc


           Håkon Sagehaug wrote:

               Hi

               Yes the content is more or less a random set of charcters.
               I'll try some combinations and see what is the best. We
        need
               to transfer the file over a network, so thats why we
        need to
               compress as much as possible. Will the block size/chunk
        size
               have anything to say?

               cheers, Håkon

               On 25 March 2010 15:48, Peter Cao <[email protected]
        <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>>>
        wrote:

                  Hi Håkon,

                  I don't need the code. As long as it works for you.
        I am happy.

                  Deflate level 6 is a good combination between file
        size and
                  performance.
                  The compression ratio depends on the content. If every
               string is
                  like a
                  random set of characters, the compression will not
        do much
               help. I
                  will leave
                  it to you to try different compression options. If
        compression
                  does not do
                  much help, it will be much better not using
        compression at all.
                  It's your call.


                  Thanks
                  --pc



                  Håkon Sagehaug wrote:



                      Hi Peter

                      Thanks for all the help so far, I've added code
        to add the
                      last elements, if you want to have it i can past
        it in
               a new
                      email to you. One more question, we need to compress
               the data
                      I've now tried like this, within createDataset(...)


                      H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                      H5.H5Pset_chunk(plist, RANK, chunkSize);
                      H5.H5Pset_deflate(plist, 9);

                      I'm not sure what is the most efficient way,
        tried to
               exchange
                      the H5Pset_deflate(plist, 9) with

                      H5.H5Pset_szip(plist,
               HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

                      but did not see any diffrens.  I read the szip would
               maybe be
                      better. If  I don't use deflate the hdf file is
        1.5 gb with
                      deflate it's 1.3 gb. So my hopes is that it can
        be further
                      decreased in size.

                      cheers, Håkon

                      On 24 March 2010 17:25, Peter Cao
        <[email protected] <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>
               <mailto:[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>>>
               wrote:

                         Hi Håkon,

                         Glad to know it work for you. Also you need
        to take
               care of the
                         case that
                         the last block does not have the size of
        BLOCK_SIZE.
               This
                      will happen
                         if the total size (25M) is not divided by
               BLOCK_SIZE. For
                      better
                         performance,
                         make sure that BLOCK_SIZE is divided by
        CHUNK_SIZE.


                         Thanks
                         --pc


                         Håkon Sagehaug wrote:

                             Hi Peter,

                             Thanks so much for the code, seems to
        work very
               well,
                      the only
                             thing I found was that when the index for
        next
               index to
                      write
                             in the hdf array, I had to add 1 to it, so
               instead of

                                start_idx = i;

                             I now have

                                start_idx = i + 1;

                             cheers, Håkon




                             On 24 March 2010 01:19, Peter Cao
               <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>
                             <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>>>
                      <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>>

                             <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>>>>>

                      wrote:

                                Hi Håkon,

                                Below is the program that you can
        start with.
               I am using
                             variable
                                length strings.
                                For fixed length strings, there are some
               extra work. You
                             may have
                                to make the
                                strings to the same length.

                                You may try different chunk sizes and
        block
               sizes to
                      have
                             the best
                                performance.

                                =======================
                                import ncsa.hdf.hdf5lib.H5;
                                import ncsa.hdf.hdf5lib.HDF5Constants;
                                import
        ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

                                public class CreateStrings {

                                  private final static String H5_FILE =
                      "G:\\temp\\strings.h5";
                                  private final static String DNAME =
        "/strs";
                                  private final static int RANK = 1;
                                  private final static long[] DIMS = {
               25000000 };
                                  private final static long[] MAX_DIMS = {
                                HDF5Constants.H5S_UNLIMITED };
                                  private final static long[]
        CHUNK_SIZE = {
               25000 };
                                  private final static int BLOCK_SIZE
        = 250000;

                                  private void createDataset(int fid)
        throws
               Exception {
                                      int did = -1, tid = -1, sid =
        -1, plist
               = -1;

                                      try {

                                          tid =
               H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                                          // use variable length to
        save space
                                          H5.H5Tset_size(tid,
                      HDF5Constants.H5T_VARIABLE);
                                          sid =
        H5.H5Screate_simple(RANK, DIMS,
                      MAX_DIMS);

                                          // figure out creation
        properties

                                          plist =
H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                                          H5.H5Pset_layout(plist,
                      HDF5Constants.H5D_CHUNKED);
                                          H5.H5Pset_chunk(plist, RANK,
               CHUNK_SIZE);

                                          did = H5.H5Dcreate(fid,
        DNAME, tid,
               sid,
                      plist);
                                      } finally {
                                          try {
                                              H5.H5Pclose(plist);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Sclose(sid);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Dclose(did);
                                          } catch (HDF5Exception ex) {
                                          }
                                      }
                                  }

                                  private void writeData(int fid) throws
               Exception {
                                      int did = -1, tid = -1, msid =
        -1, fsid
               = -1;
                                      long[] count = { BLOCK_SIZE };

                                      try {
                                          did = H5.H5Dopen(fid, DNAME);
                                          tid = H5.H5Dget_type(did);
                                          fsid = H5.H5Dget_space(did);
                                          msid = H5.H5Screate_simple(RANK,
               count, null);
                                          String[] strs = new
        String[BLOCK_SIZE];

                                          int idx = 0, block_indx = 0,
               start_idx = 0;
                                          long t0 = 0, t1 = 0;
                                          t0 = System.currentTimeMillis();
                                          System.out.println("Total
        number of
               blocks = "
                                                  + (DIMS[0] /
        BLOCK_SIZE));
                                          for (int i = 0; i < DIMS[0];
        i++) {
                                              strs[idx++] = "str" + i;
                                              if (idx == BLOCK_SIZE) { //
               operator % is
                             very expensive
                                                  idx = 0;
H5.H5Sselect_hyperslab(fsid,
                                HDF5Constants.H5S_SELECT_SET,
                                                          new long[] {
               start_idx },
                      null,
                             count,
                                null);
                                                  H5.H5Dwrite(did,
        tid, msid,
               fsid,
HDF5Constants.H5P_DEFAULT,
                      strs);

                                                  if (block_indx == 10) {
                                                      t1 =
               System.currentTimeMillis();
System.out.println("Total time
                             (minutes) = "
                                                              + ((t1 -
        t0) *
               (DIMS[0] /
                                BLOCK_SIZE)) / 1000
                                                              / 600);
                                                  }

                                                  block_indx++;
                                                  start_idx = i;
                                              }

                                          }

                                      } finally {
                                          try {
                                              H5.H5Sclose(fsid);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Sclose(msid);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Dclose(did);
                                          } catch (HDF5Exception ex) {
                                          }
                                      }
                                  }

                                  private void createFile() throws
        Exception {
                                      int fid = -1;

                                      fid = H5.H5Fcreate(H5_FILE,
                      HDF5Constants.H5F_ACC_TRUNC,

                                              HDF5Constants.H5P_DEFAULT,
                             HDF5Constants.H5P_DEFAULT);

                                      if (fid < 0)
                                          return;

                                      try {
                                          createDataset(fid);
                                          writeData(fid);
                                      } finally {
                                          H5.H5Fclose(fid);
                                      }
                                  }

                                  /**
                                   * @param args
                                   */
                                  public static void main(String[] args) {
                                      try {
                                          (new
        CreateStrings()).createFile();
                                      } catch (Exception ex) {
                                          ex.printStackTrace();
                                      }
                                  }

                                }
                                =========================




_______________________________________________
                                Hdf-forum is for HDF software users
        discussion.
                                [email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>>
                             <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>>>


http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org




------------------------------------------------------------------------



_______________________________________________
                             Hdf-forum is for HDF software users
        discussion.
                             [email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________
                         Hdf-forum is for HDF software users discussion.
                         [email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org







                      --         Håkon Sagehaug, Scientific Programmer
                      Parallab, Uni BCCS/Uni Research
                      [email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>> <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>>,

                      phone +47 55584125

------------------------------------------------------------------------

                      _______________________________________________
                      Hdf-forum is for HDF software users discussion.
                      [email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________
                  Hdf-forum is for HDF software users discussion.
                  [email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org





------------------------------------------------------------------------

               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        [email protected] <mailto:[email protected]>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected] <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org




------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to