Hi Peter,

Thanks for the reply, sorry if I'm asking alot of questions, but I still
can't figure out how to connect the write of a dataset to a file.
We want to give the hdf file to a another program that does some analysis
for us. I guess my psudo code looks something like this.

Assume that the file has 25M lines

List lineEntry
int datsetSize = 25000000
int index = 0;
//Number of strings to write in one chunk
int maxRecords = 1500000;
for(String line in file F) {
    lineEntry.add(line)
    if(index % maxRecords == 0){
       String [] valuesToWrite = lineEntry.toArray();

       //Find out where in the dataset to start the writing from, using
hyperslab
       int dataspace_id = H5.H5Dget_space(dataset_id);
       H5.H5Sselect_all(dataspace_id);
       //2'end iteration the start index for writing is 1500000*2
      //So the writing will start from 3 M  -> 4.5M
      H5
                        .H5Sselect_hyperslab(dataspace_id,
                                HDF5Constants.H5S_SELECT_NOTB, start, null,
                                count, null);
       H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                        HDF5Constants.H5P_DEFAULT, valuesToWrite);
        lineEntry.clear();

     }
    index++
}

Does this look correct? Also I don't need to extend the dataset if I can
allocate 25M entries, as long as I dont have to keep them all in memory at
the same time.

cheers, Håkon
On 22 March 2010 17:01, Peter Cao <[email protected]> wrote:

> Hi Håkon,
>
> A minor change to the writeUp(). For testing purpose, using String[] array
> instead of StringBuffer and
> direct pass the string array to H5Dwrite(). Currently, H5Dwrite() in
> hdf-java does not handle 2D array
> because of performance issue. For your case, you do not need to call
> H5DWrite() at writeUp() since
> you are going to write data part by part later.
>
> H5.H5Dextend() will be replaced with H5Dset_extent() in HDF5 1.8. The new
> HDF5 1.8 APIs are
> not supported in hdf-java. We are still working on this. For now, just use
> H5Dextend().
>
> When you create the dataset, you already have the space for 25M strings
> (i.e. new long[] { 25000000 }).
> Do you want to extend your data space more that? If not, you do not need to
> call H5Dextend(). Just
> select chunks you want to write.
>
>
> Thanks
> --pc
>
>
>
> Håkon Sagehaug wrote:
>
>> Hi
>>
>> So I tried, but not sure it worked. I cant figure out how to connect the
>> dataset to a file, so I can view it in hdf-viewer. Here is my method for
>> writing a an array of Strings to a dataset.
>>
>>
>> private static void writeUn() {
>>        int file_id = -1;
>>        int dcpl_id = -1;
>>        int dataspace_id = -1;
>>        int dataset_id = -1;
>>        int memtype_id = -1;
>>        int filetype_id = -1;
>>        int plist = -1;
>>
>>        long[] dims = { DIM_X };
>>        long[] chunk_dims = { CHUNK_X };
>>        long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
>>        byte[][] dset_data = new byte[DIM_X][SDIM];
>>        StringBuffer[] str_data = new StringBuffer[DIM_X];
>>
>>        // Initialize the dataset.
>>        for (int indx = 0; indx < DIM_X; indx++)
>>            str_data[indx] = new StringBuffer(String.valueOf("iteration "
>>                    + (indx + 1)));
>>
>>        // Create a new file using default properties.
>>        try {
>>            file_id = H5.H5Fcreate(FILENAME_A, HDF5Constants.H5F_ACC_TRUNC,
>>                    HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);
>>        } catch (Exception e) {
>>            e.printStackTrace();
>>        }
>>
>>        try {
>>            filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
>>            H5.H5Tset_size(filetype_id, SDIM);
>>
>>            plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
>>            H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
>>            H5.H5Pset_chunk(plist, 1, new long[] { 1024 });
>>
>>            H5.H5Pset_deflate(plist, 5);
>>
>>                       dataset_id = H5.H5Screate_simple(1, new long[] {
>> 25000000 }, new
>>             long[] { HDF5Constants.H5S_UNLIMITED });
>>                   } catch (Exception e) {
>>            e.printStackTrace();
>>        }
>>
>>        // Write the data to the dataset.
>>        try {
>>            for (int indx = 0; indx < DIM_X; indx++) {
>>                for (int jndx = 0; jndx < SDIM; jndx++) {
>>                    if (jndx < str_data[indx].length())
>>                        dset_data[indx][jndx] = (byte) str_data[indx]
>>                                .charAt(jndx);
>>                    else
>>                        dset_data[indx][jndx] = 0;
>>                }
>>            }
>>            if ((dataset_id >= 0) && (memtype_id >= 0))
>>                H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
>>                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
>>                        HDF5Constants.H5P_DEFAULT, dset_data);
>>        } catch (Exception e) {
>>            e.printStackTrace();
>>        }
>> }
>>
>> So my question, is this correct way of doing it and how do I connect the
>> dataset to the file. I guess it's time of creating the dataset.
>>
>> After this is the way forward, like this
>>
>> 1. H5.H5Dextend(dataset_id, extdims);
>> 2. dataspace_id = H5.H5Dget_space(dataset_id);
>> 3. H5.H5Sselect_all(dataspace_id);
>>   // Subtract a hyperslab reflecting the original dimensions from
>>   // the
>>   // selection. The selection now contains only the newly extended
>>   // portions of the dataset.
>>   count[0] = dims[0];
>>   count[1] = dims[1];
>>   H5.H5Sselect_hyperslab(dataspace_id,
>>                                HDF5Constants.H5S_SELECT_NOTB, start, null,
>>                                count, null);
>>
>>   // Write the data to the selected portion of the dataset.
>>   if (dataset_id >= 0)
>>         H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
>>              HDF5Constants.H5S_ALL, dataspace_id,
>>              HDF5Constants.H5P_DEFAULT, extend_dset_data);
>>
>> I also see that H5.H5Dextend  is depricated, from version 1.8, is there
>> another method to user?
>>
>> cheers, Håkon
>> On 19 March 2010 16:05, Peter Cao <[email protected] <mailto:
>> [email protected]>> wrote:
>>
>>    Hi Hakon,
>>
>>    I assume you are using 1D array of strings. Here are some hints
>>    for you:
>>
>>    1) You may just use string dataype. You can use variable length
>>    string if  your strings have different size,
>>       or you can use fixed length string if your strings are about
>>    the same length, e.g.
>>              tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
>>              for fixed length of 128
>>              H5.H5Tset_size(128);
>>              for variable length
>>              H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
>>    2) Set dataset creation property for chunking and compression
>>                  plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
>>                  H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
>>                  H5.H5Pset_chunk(plist, 1, new long[] {1024}); // set
>>    the chunk size to be about 2MB for best performance
>>                  H5.H5Pset_deflate(plist, 5);
>>
>>    3) Set the dimension size, e.g.
>>              sid = H5.H5Screate_simple(1, new long[]{25000000}, new
>>    long[] {HDF5Constants.H5S_UNLIMITED});
>>
>>
>>    Thanks
>>    --pc
>>
>>
>>    Håkon Sagehaug wrote:
>>
>>        Hi Peter
>>
>>        My problem is actually before I can create the dataset the
>>        first time, I can't figure out the correct data type to use. I
>>        guess I should use a byte type, since the strins are converted
>>        to bytes
>>
>>        Håkon
>>
>>        On 19 March 2010 15:29, Peter Cao <[email protected]
>>        <mailto:[email protected]> <mailto:[email protected]
>>
>>        <mailto:[email protected]>>> wrote:
>>
>>           Håkon,
>>
>>           There was a typo in my previous email. You do NOT need to
>>        read the
>>           first chunk in order
>>           to write the second chunk. You can just select whatever
>>        chunks you
>>           want to write.
>>
>>           Sorry for the misleading.
>>
>>           Thanks
>>           --pc
>>
>>
>>           Håkon Sagehaug wrote:
>>
>>               Hi Peter
>>
>>               I'm trying to do it with the read chunk by chunk, but
>>        having
>>               trouble creating the data set, in the example [1] it's done
>>               like this
>>
>>               H5.H5Dcreate(file_id, DATASETNAME,
>>                                      HDF5Constants.H5T_STD_I32LE,
>>               dataspace_id, dcpl_id);
>>
>>               the type is for int, but I cant seem to find the
>>        correct one
>>               for string, in example[2] with string arrays t looks
>>        like this,
>>
>>                H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
>>                                      dataspace_id,
>>        HDF5Constants.H5P_DEFAULT);
>>
>>               If I create the dataset like this when I want to
>>        dynamiccaly
>>               add I can only get the first byte in each of the
>>        string. Any
>>               tips on what type I should use?
>>
>>
>>               Håkon
>>
>>               [1]
>>
>> http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java
>>
>>                      [2]
>> http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java
>>
>>
>>
>>
>>               --         Håkon Sagehaug, Scientific Programmer
>>               Parallab, Uni BCCS/Uni Research
>>               [email protected] <mailto:[email protected]>
>>        <mailto:[email protected] <mailto:[email protected]>>
>>               <mailto:[email protected]
>>        <mailto:[email protected]> <mailto:[email protected]
>>        <mailto:[email protected]>>>,
>>
>>               phone +47 55584125
>>
>>  ------------------------------------------------------------------------
>>
>>               _______________________________________________
>>               Hdf-forum is for HDF software users discussion.
>>               [email protected] <mailto:[email protected]>
>>        <mailto:[email protected] <mailto:[email protected]>>
>>
>>
>>
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>           _______________________________________________
>>           Hdf-forum is for HDF software users discussion.
>>           [email protected] <mailto:[email protected]>
>>        <mailto:[email protected] <mailto:[email protected]>>
>>
>>
>>
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>
>>
>>  ------------------------------------------------------------------------
>>
>>        _______________________________________________
>>        Hdf-forum is for HDF software users discussion.
>>        [email protected] <mailto:[email protected]>
>>        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>    _______________________________________________
>>    Hdf-forum is for HDF software users discussion.
>>    [email protected] <mailto:[email protected]>
>>    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to