Hi Håkon,

From you psudo code, I can see you are writing 1.5M lines a time.

Basically your program is right. One minor change at H5Dwrite():

      H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1, msid, fsid,
                       HDF5Constants.H5P_DEFAULT, valuesToWrite);

Where msid is the memory space, you can get it from H5Screate_simple().
fsid is the file space, which you can get it from H5Dget_space(). You need
pass fisd to H5Sselect_hyperslab() for selecting the part to write.

(If I can find time on the weekend, I will write a simple program based on
your psudo code.)

Thanks
--pc


Håkon Sagehaug wrote:
Hi Peter,

Thanks for the reply, sorry if I'm asking alot of questions, but I still can't figure out how to connect the write of a dataset to a file. We want to give the hdf file to a another program that does some analysis for us. I guess my psudo code looks something like this.

Assume that the file has 25M lines

List lineEntry
int datsetSize = 25000000
int index = 0;
//Number of strings to write in one chunk
int maxRecords = 1500000;
for(String line in file F) {
    lineEntry.add(line)
    if(index % maxRecords == 0){
String [] valuesToWrite = lineEntry.toArray(); //Find out where in the dataset to start the writing from, using hyperslab
       int dataspace_id = H5.H5Dget_space(dataset_id);
       H5.H5Sselect_all(dataspace_id);
       //2'end iteration the start index for writing is 1500000*2
      //So the writing will start from 3 M  -> 4.5M
      H5
                        .H5Sselect_hyperslab(dataspace_id,
HDF5Constants.H5S_SELECT_NOTB, start, null,
                                count, null);
       H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                        HDF5Constants.H5P_DEFAULT, valuesToWrite);
        lineEntry.clear();

     }
    index++
}

Does this look correct? Also I don't need to extend the dataset if I can allocate 25M entries, as long as I dont have to keep them all in memory at the same time.

cheers, Håkon
On 22 March 2010 17:01, Peter Cao <[email protected] <mailto:[email protected]>> wrote:

    Hi Håkon,

    A minor change to the writeUp(). For testing purpose, using
    String[] array instead of StringBuffer and
    direct pass the string array to H5Dwrite(). Currently, H5Dwrite()
    in hdf-java does not handle 2D array
    because of performance issue. For your case, you do not need to
    call H5DWrite() at writeUp() since
    you are going to write data part by part later.

    H5.H5Dextend() will be replaced with H5Dset_extent() in HDF5 1.8.
    The new HDF5 1.8 APIs are
    not supported in hdf-java. We are still working on this. For now,
    just use H5Dextend().

    When you create the dataset, you already have the space for 25M
    strings (i.e. new long[] { 25000000 }).
    Do you want to extend your data space more that? If not, you do
    not need to call H5Dextend(). Just
    select chunks you want to write.


    Thanks
    --pc



    Håkon Sagehaug wrote:

        Hi

        So I tried, but not sure it worked. I cant figure out how to
        connect the dataset to a file, so I can view it in hdf-viewer.
        Here is my method for writing a an array of Strings to a dataset.


        private static void writeUn() {
               int file_id = -1;
               int dcpl_id = -1;
               int dataspace_id = -1;
               int dataset_id = -1;
               int memtype_id = -1;
               int filetype_id = -1;
               int plist = -1;

               long[] dims = { DIM_X };
               long[] chunk_dims = { CHUNK_X };
               long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
               byte[][] dset_data = new byte[DIM_X][SDIM];
               StringBuffer[] str_data = new StringBuffer[DIM_X];

               // Initialize the dataset.
               for (int indx = 0; indx < DIM_X; indx++)
                   str_data[indx] = new
        StringBuffer(String.valueOf("iteration "
                           + (indx + 1)));

               // Create a new file using default properties.
               try {
                   file_id = H5.H5Fcreate(FILENAME_A,
        HDF5Constants.H5F_ACC_TRUNC,
                           HDF5Constants.H5P_DEFAULT,
        HDF5Constants.H5P_DEFAULT);
               } catch (Exception e) {
                   e.printStackTrace();
               }

               try {
                   filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                   H5.H5Tset_size(filetype_id, SDIM);

                   plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                   H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                   H5.H5Pset_chunk(plist, 1, new long[] { 1024 });

                   H5.H5Pset_deflate(plist, 5);

                              dataset_id = H5.H5Screate_simple(1, new
        long[] { 25000000 }, new
                    long[] { HDF5Constants.H5S_UNLIMITED });
                          } catch (Exception e) {
                   e.printStackTrace();
               }

               // Write the data to the dataset.
               try {
                   for (int indx = 0; indx < DIM_X; indx++) {
                       for (int jndx = 0; jndx < SDIM; jndx++) {
                           if (jndx < str_data[indx].length())
                               dset_data[indx][jndx] = (byte)
        str_data[indx]
                                       .charAt(jndx);
                           else
                               dset_data[indx][jndx] = 0;
                       }
                   }
                   if ((dataset_id >= 0) && (memtype_id >= 0))
                       H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                               HDF5Constants.H5S_ALL,
        HDF5Constants.H5S_ALL,
                               HDF5Constants.H5P_DEFAULT, dset_data);
               } catch (Exception e) {
                   e.printStackTrace();
               }
        }

        So my question, is this correct way of doing it and how do I
        connect the dataset to the file. I guess it's time of creating
        the dataset.

        After this is the way forward, like this

        1. H5.H5Dextend(dataset_id, extdims);
        2. dataspace_id = H5.H5Dget_space(dataset_id);
        3. H5.H5Sselect_all(dataspace_id);
          // Subtract a hyperslab reflecting the original dimensions from
          // the
          // selection. The selection now contains only the newly extended
          // portions of the dataset.
          count[0] = dims[0];
          count[1] = dims[1];
          H5.H5Sselect_hyperslab(dataspace_id,
                                       HDF5Constants.H5S_SELECT_NOTB,
        start, null,
                                       count, null);

          // Write the data to the selected portion of the dataset.
          if (dataset_id >= 0)
                H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
                     HDF5Constants.H5S_ALL, dataspace_id,
                     HDF5Constants.H5P_DEFAULT, extend_dset_data);

        I also see that H5.H5Dextend  is depricated, from version 1.8,
        is there another method to user?

        cheers, Håkon
        On 19 March 2010 16:05, Peter Cao <[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>> wrote:

           Hi Hakon,

           I assume you are using 1D array of strings. Here are some hints
           for you:

           1) You may just use string dataype. You can use variable length
           string if  your strings have different size,
              or you can use fixed length string if your strings are about
           the same length, e.g.
                     tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                     for fixed length of 128
                     H5.H5Tset_size(128);
                     for variable length
                     H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
           2) Set dataset creation property for chunking and compression
                         plist =
        H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                         H5.H5Pset_layout(plist,
        HDF5Constants.H5D_CHUNKED);
                         H5.H5Pset_chunk(plist, 1, new long[] {1024});
        // set
           the chunk size to be about 2MB for best performance
                         H5.H5Pset_deflate(plist, 5);

           3) Set the dimension size, e.g.
                     sid = H5.H5Screate_simple(1, new
        long[]{25000000}, new
           long[] {HDF5Constants.H5S_UNLIMITED});


           Thanks
           --pc


           Håkon Sagehaug wrote:

               Hi Peter

               My problem is actually before I can create the dataset the
               first time, I can't figure out the correct data type to
        use. I
               guess I should use a byte type, since the strins are
        converted
               to bytes

               Håkon

               On 19 March 2010 15:29, Peter Cao <[email protected]
        <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>

               <mailto:[email protected] <mailto:[email protected]>>>>
        wrote:

                  Håkon,

                  There was a typo in my previous email. You do NOT
        need to
               read the
                  first chunk in order
                  to write the second chunk. You can just select whatever
               chunks you
                  want to write.

                  Sorry for the misleading.

                  Thanks
                  --pc


                  Håkon Sagehaug wrote:

                      Hi Peter

                      I'm trying to do it with the read chunk by
        chunk, but
               having
                      trouble creating the data set, in the example
        [1] it's done
                      like this

                      H5.H5Dcreate(file_id, DATASETNAME,
                                             HDF5Constants.H5T_STD_I32LE,
                      dataspace_id, dcpl_id);

                      the type is for int, but I cant seem to find the
               correct one
                      for string, in example[2] with string arrays t looks
               like this,

                       H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                                             dataspace_id,
               HDF5Constants.H5P_DEFAULT);

                      If I create the dataset like this when I want to
               dynamiccaly
                      add I can only get the first byte in each of the
               string. Any
                      tips on what type I should use?


                      Håkon

                      [1]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

[2]http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java




                      --         Håkon Sagehaug, Scientific Programmer
                      Parallab, Uni BCCS/Uni Research
                      [email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>> <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>>,

                      phone +47 55584125
------------------------------------------------------------------------

                      _______________________________________________
                      Hdf-forum is for HDF software users discussion.
                      [email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>


http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________
                  Hdf-forum is for HDF software users discussion.
                  [email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>
               <mailto:[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>


http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



------------------------------------------------------------------------

               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org





        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        [email protected] <mailto:[email protected]>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected] <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org





------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to